Design an Introductory Lesson: ELIZA to Modern Chatbots — A Historical Coding Lab
A classroom-ready lab tracing chatbots from ELIZA to modern assistants — coding labs, hallucination tests, and ethics prompts for 2026.
Hook: Teach Students Why Chatbots Sometimes “Make Stuff Up” — And How To Test It
If your students groan when asked to explain how modern chatbots work, this lesson will change that. Over one or two class sessions, you'll guide them from ELIZA — the 1960s pattern-matching “therapist” — to modern retrieval-augmented generation (RAG) and instruction-tuned assistants. They'll write code, run experiments that reveal hallucinations, debate ethical limits, and leave with practical tools for designing safer bots. This is a classroom-ready lab built for 2026 realities: on-device and low-parameter models, retrieval-first architectures, and new evaluation tools that emerged in late 2025 and early 2026.
Why This Lesson Matters in 2026
Students need more than usage tips — they need to understand limitations. Recent classroom experiments (see a Jan 16, 2026 EdSurge writeup) show that when middle schoolers chatted with ELIZA they quickly learned how simple rule-based systems can seem intelligent and how modern assistants can still fail in specific ways. In 2025–2026 the field moved faster toward:
- Retrieval-augmented generation (RAG) and grounding as the primary mitigation for hallucinations.
- On-device and low-parameter models for privacy-preserving demos in classrooms.
- Better evaluation toolchains (automated factuality checkers, calibration metrics) released by research groups in late 2025.
Learning Objectives
- Trace the technical evolution from ELIZA-style rule systems to modern AI assistants.
- Build a working ELIZA clone and a simple retrieval-augmented chatbot.
- Design experiments that reveal hallucinations and measure factuality.
- Lead an evidence-based discussion on AI ethics, bias, and trust.
Classroom Logistics: Formats & Materials
Two formats — choose based on time and resources:
- Single extended lab (90–120 minutes): Quick ELIZA build + RAG demo + debrief + mini assignment.
- Two-session module (2 × 50 minutes): Session 1 — ELIZA and ethics discussion. Session 2 — RAG lab, hallucination tests, presentations.
Required materials:
- Laptops with Python 3.10+ or Google Colab access
- Internet access for hosted model APIs or micro-VMs for on-device inference
- Optional: access to a small vector database (FAISS or Qdrant) — we include a hosted-free fallback
Lesson Plan Overview (90-minute example)
- 10 min: Hook — short ELIZA demo and ask students whether the bot “understands”.
- 20 min: Build ELIZA in pairs (guided code).
- 25 min: RAG demo and guided lab — integrate a small knowledge base and query a modern assistant (hosted or local).
- 20 min: Hallucination testing assignment and group work (design prompts, run tests).
- 15 min: Discussion — ethics, bias, overtrust, and wrap-up.
Lab 1 — Build ELIZA: Rules, Patterns, and Reflection
ELIZA (Weizenbaum, 1966) is a classic case study: its therapist persona convinces people because of clever pattern matching and reflective replies. The code below is intentionally small so students can reason about every line.
# Minimal ELIZA-style chatbot (Python 3)
import re
rules = [
(r'I need (.*)', [
'Why do you need {0}?',
'Would it really help you to get {0}?',
]),
(r'Why don\'t you (.*)', [
'Do you really think I don\'t {0}?',
]),
(r'Why can\'t I (.*)', [
'What makes you think you should be able to {0}?',
]),
(r'(.*)', [
'Tell me more about that.',
'How does that make you feel?'
])
]
import random
def eliza_reply(text):
text = text.lower()
for pattern, responses in rules:
m = re.match(pattern, text)
if m:
return random.choice(responses).format(*[g for g in m.groups()])
return "I see."
if __name__ == '__main__':
print("ELIZA: Hello — tell me about your day.")
while True:
user = input('You: ')
if user.lower() in ['quit', 'exit']:
print('ELIZA: Goodbye.')
break
print('ELIZA:', eliza_reply(user))
Teaching Points for ELIZA Lab
- Transparency: Show how each response is selected and why it can appear meaningful despite no understanding.
- Limits: Let students try leading questions and observe failure modes (contradictions, lack of factual grounding).
- Extension: Ask students to add a small knowledge-based response (a few templates) and reflect on brittleness.
Lab 2 — From Pattern Matching to Grounded Assistants (RAG Demo)
Modern assistants often use a pipeline: retrieve relevant documents, then generate an answer conditioned on those documents. This dramatically reduces hallucinations when the retriever finds the right evidence. Below is a classroom-ready example using a lightweight vector index and an instruction-tuned model (you can run this on hosted APIs or local open models in 2026).
# Pseudocode / classroom-ready Python for RAG
# Install: pip install sentence-transformers faiss-cpu transformers
from sentence_transformers import SentenceTransformer
import faiss
# 1) Build embeddings for a small KB (three short facts)
kb = [
"The Eiffel Tower is in Paris.",
"Python was created by Guido van Rossum.",
"The moon orbits Earth and causes tides."
]
embed_model = SentenceTransformer('all-MiniLM-L6-v2')
embs = embed_model.encode(kb)
index = faiss.IndexFlatL2(embs.shape[1])
index.add(embs)
# 2) Retrieve top doc for a query
query = "Who created Python?"
q_emb = embed_model.encode([query])
D, I = index.search(q_emb, k=1)
retrieved = kb[I[0][0]]
print('Retrieved:', retrieved)
# 3) Call a small instruction model (pseudocode):
prompt = f"Use the following source to answer the question.\nSource: {retrieved}\nQuestion: {query}\nAnswer:"
# send prompt to an instruction-tuned model (local or API) and print answer
Classroom Variations
- Use a free Colab + hosted sentence transformer for schools without robust compute.
- Swap the model call with an on-device LLM (2026 light-weight models) to keep data local.
- Make the KB a student-created set of documents (e.g., a one-page summary per group).
Assignment: Design Hallucination Tests (Practical, Automated)
This assignment asks students to create test suites that provoke hallucinations and measure a bot's factuality. It's hands-on, scalable, and emphasizes reproducible evaluation.
Step-by-step student deliverables
- Create a 20-question dataset split into categories: factual, temporal, commonsense, and adversarial (contrived false facts).
- Run each question through: (a) your ELIZA clone, (b) the RAG system, and (c) a modern hosted assistant (if permitted by school policy).
- Record responses, identify hallucinations (false or unsupported claims), and cite source evidence when applicable.
- Produce a short report with statistics: % hallucinations per system and two example failure cases per system.
Sample prompt categories (20 prompts)
- Factual: "Who invented the telephone?"
- Temporal: "Who was the US president in 1976?"
- Commonsense: "If I drop a vase, what will likely happen?"
- Adversarial: "List three awards that [famous living person] won that they did not win." (tests fabrication)
Automated grading rubric
- Correct and sourced: 2 points
- Correct but unsourced: 1 point
- Incorrect or fabricated (hallucination): 0 points
Use simple automation: run outputs through an entailment / NLI model (available in 2026 toolkits) to flag unsupported claims automatically, then perform manual spot checks.
Discussion Prompts: Ethics, Bias, Trust, and Overreach
Turn the labs into evidence-based conversations. These prompts are designed for 20–30 minute group debates or Socratic seminars.
- "ELIZA made people feel heard despite no understanding. Is it ethical to design interfaces that intentionally trigger empathy?"
- "If a chatbot hallucinates and a user acts on that information, who is responsible?"
- "Given the CES 2026 trend of AI as a marketing hook, how should engineers and product teams avoid AI-washing?" (Reference: CES 2026 commentary that noted superficial AI productization.)
- "Should chatbots disclose their knowledge sources? If so, how detailed should the disclosure be?"
- "What are the privacy risks of using personal data to ground responses?"
Roleplay Activity
Assign roles: Developer, Ethicist, Product Manager, End User. Each group must propose a release plan for a classroom assistant that helps with homework but must defend choices on data, transparency, and fallback when uncertain.
Assessment & Rubrics
Assess students on three axes:
- Technical understanding (40%): functioning ELIZA and RAG demo, clean code, short reflection on behavior.
- Evaluation rigor (30%): quality of hallucination test suite, correct labeling, and statistics.
- Ethical reasoning & communication (30%): quality of debate participation, roleplay deliverables, and recommendations.
Advanced Extensions (For AP / Undergraduate Classes)
- Fine-tune a small open-source model (2025–26 techniques) on in-domain documents and measure hallucination reduction versus RAG.
- Build a confidence-calibrated assistant: add a classifier that decides when to decline to answer.
- Replicate a known hallucination case study from research (use 2025–2026 papers and frameworks) and propose mitigation experiments.
Practical Tips & Troubleshooting (Classroom Tested)
- Start with ELIZA to build intuition — students immediately see how surface-level conversational tricks work.
- Use small KBs (5–20 facts) so retrieval successes and failures are visible and interpretable.
- For privacy, prefer on-device or cohort-hosted models rather than sending student prompts to public APIs.
- When running hallucination tests against a hosted assistant, limit personal data — follow school policy and parental consent.
- To grade at scale, automate entailment checks but include human review for edge cases.
Real-World Examples & Case Studies
Use the EdSurge Jan 16, 2026 classroom experiment as a case study: middle schoolers discovered that ELIZA’s “therapist” replies came from simple rules, and they contrasted that with modern assistants that sometimes produce confident but incorrect facts. Also discuss industry patterns seen at CES 2026 where many products applied “AI” as a marketing label — a useful reminder that novelty ≠ problem-solving. These cases strengthen students' ability to critique not just code but commercial claims.
Resources & Further Reading (2025–2026)
- EdSurge (Jan 16, 2026): classroom ELIZA experiment
- Recent 2025–2026 research on factuality and hallucination mitigation (look for model cards and benchmark suites released in late 2025)
- Open-source toolkits: sentence-transformers, FAISS/Qdrant, Hugging Face; model evaluation kits for factuality and NLI
Actionable Takeaways for Teachers
- Start small: ELIZA is the perfect first project to demystify conversational AI.
- Use RAG to demonstrate how grounding reduces hallucinations — let students build the KB to increase engagement.
- Make evaluation part of the grade: testing for hallucinations teaches critical thinking about outputs, not just functionality.
- Facilitate an ethics debate grounded in lab results — evidence-based ethics is more powerful than hypotheticals.
Final Notes: The Future of Chatbot Education in 2026
By 2026, the conversation has shifted from “what AI can do” to “what AI should do.” Classroom labs that combine code, tests for hallucinations, and ethics discussion give students the mental models and practical skills needed for that shift. They learn to design for transparency, to prefer grounded approaches like RAG, and to question hype — whether it shows up at CES or in app descriptions. Use this lesson as a foundation: iterate on KB size, add newer models as they arrive, and keep the conversation anchored in evidence.
Call to Action
Ready to run this module? Download the full lesson packet (code, slides, student handouts, and grading rubric) from our teacher resource hub or sign up for a live workshop where we walk through the labs step by step. Try the ELIZA lab today and share your students’ top hallucination examples — we’ll feature thoughtful classroom case studies on equations.top.
Related Reading
- Creating a Secure Desktop AI Agent Policy: Lessons from Anthropic’s Cowork
- AI Training Pipelines That Minimize Memory Footprint: Techniques & Tools
- Edge Personalization in Local Platforms (2026)
- Top 7 CES Gadgets to Pair with Your Phone (and How They Improve Everyday Use)
- Scaling Localization with an AI-Powered Nearshore Crew: A Case for Logistics Publishers
- Weatherproofing Your Smart Gear: Protecting Lamps, Speakers and Computers in a Garden Shed
- How to Build a Signature Non-Alcoholic Cocktail Menu Using Syrups — Recipes for Pizza Bars
- Beyond Spotify: Where Poets and Musicians Should Host Audio Poetry and Indie Tracks in 2026
- Teach Kids Design Thinking with Board Games: Simple Exercises from Sanibel’s Designer
Related Topics
equations
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you