Create an ELIZA vs. Modern Chatbot Classroom Demo: Visualize Rule-Based vs. ML Approaches
nlpdemoclassroom

Create an ELIZA vs. Modern Chatbot Classroom Demo: Visualize Rule-Based vs. ML Approaches

eequations
2026-02-10 12:00:00
10 min read
Advertisement

Hands-on classroom demo contrasting ELIZA's rule-based replies with transformer ML chatbots. Visualize failures, hallucinations, and math behind both.

Hook: Stop students from treating chatbots like magic — make the mechanics visible

Students and teachers often see modern chatbots as black boxes that either “know everything” or “make things up.” That confusion wastes classroom time and erodes trust. This classroom demo shows how to visualize and compare ELIZA’s rule-based mechanics and a transformer-based ML chatbot, exposing clear failure modes — from brittle pattern matching to convincing hallucinations — and giving learners hands-on ways to reason about both systems.

The core idea (inverted pyramid): why this demo matters now

By 2026, AI literacy is part of many curricula and school policies. Recent discussions (late 2025–early 2026) pushed educators to teach not just how to use AI but how it works and fails. This demo helps students:

  • See how a 1960s rule-based bot like ELIZA maps input patterns to replies;
  • Inspect a modern transformer producing fluent answers, and where probability-driven outputs go wrong (hallucinations);
  • Build intuition with visualizations: pattern matches, attention heatmaps, token probability charts, and failure-rate graphs.

What you’ll build in class: a side-by-side interactive demo

The demo contains three synchronized panels: an ELIZA engine, a transformer model (small open-weight model or hosted API), and a visualization console. Students type the same prompts and witness contrasting behaviors. Key panels:

  • Transcript panel: live chat logs for both agents;
  • Rule inspector: highlights which ELIZA pattern fired and shows template substitution;
  • Model internals: token logits, softmax distributions, and attention heatmaps for the transformer;
  • Failure tracker: records when answers are unverifiable or inconsistent (hallucinations, contradictions, or pattern mismatches).

Why these visuals work

ELIZA’s decisions are traceable — every reply points back to a rule. Transformers are probabilistic: you can’t point to a single rule, but you can visualize internal signals (attention and token probabilities) that explain why a token was chosen.

Mathematical representations: put both systems on a common language

To teach rigor, model both systems as mathematical mappings from input text to output text. Use these concise formulas in class:

ELIZA (rule-based) as pattern mapping

Treat ELIZA as a set of ordered rules R = {r1, r2, ..., rN}. Each rule r consists of a pattern p (often a regular expression) and a response template t.

Define a matching function M(x):

M(x) = r_k where k = min{i | p_i matches x}. If no p_i matches, use a default template.

Response generation is template substitution S:

y = S(r_k, x)

Eliza therefore computes:

f_ELIZA(x) = S(M(x), x)

This is deterministic given the rule order and substitution logic.

Transformer (ML) as autoregressive probabilistic model

A transformer estimates a distribution over next tokens conditioned on previous tokens. For an input prompt x (tokenized into tokens x_1...x_m), the model computes hidden states h_i and predicts token probabilities:

P(y_i | y_{

Attention (scaled dot-product) is a core subroutine:

Attention(Q,K,V) = softmax( QK^T / sqrt(d_k) ) V

Overall, the model is a composition of layers L: h = L(x) and it samples output y autoregressively, maximizing likelihood during training. The most likely sequence is found by:

f_TRANS(x) ≈ argmax_y Π_i P(y_i | y_{

In practice, sampling with temperature τ, top-k, or nucleus (top-p) changes behavior:

P_sample ∝ softmax( (logits) / τ )

Bridging the notation — a teacher-friendly summary

ELIZA: f_ELIZA is rule selection + substitution; deterministic, explainable per reply. Transformer: f_TRANS is probabilistic, distributed across many parameters, explainable via visualized internal signals but not by single rules.

Constructing the classroom demo: step-by-step

Use these practical steps to prepare a 45–90 minute lesson that scales from middle school to high school and undergrad intro AI courses.

  1. Set up environments
    • ELIZA: lightweight Python or JavaScript implementation (pattern list and templates).
    • Transformer: local compact local models or hosted APIs — for privacy and cost, prefer on-prem or sovereign hosting when working with student data and institutional policies.
  2. Create synchronized UI
    • Input box feeds both engines; side-by-side transcript panels record replies. Consider composable UX pipelines to keep the interface responsive and modular.
    • Visualization column shows rule matches for ELIZA and token-level internals for the transformer; design the column using principles from operational dashboard design so instructors can focus attention where failures appear.
  3. Prepare prompt sets
    • Controlled probes: simple questions, ambiguous statements, factual prompts, and adversarial puzzles.
    • Open-ended prompts to show fluency vs. factuality tradeoffs.
  4. Run live experiments
    • Students vote on prompts, record differences, and hypothesize causes. Use live sync techniques inspired by realtime workroom patterns for a smooth collaborative demo.

Visualization recipes (graphs & animations)

Good visualizations make abstract failure modes concrete. Here are five visualizations to build or embed in the demo.

  • Rule match trace: highlight which pattern and captured groups ELIZA used; animate substitutions so students see variable binding in real time.
  • Token probability bar chart: for each generated token, show the top-10 token probabilities and how temperature/top-k change the distribution.
  • Attention heatmap: animate attention matrices across layers for a sample reply; let students toggle to see early vs. late layers. If you're streaming these visuals in class, techniques from hybrid studio ops (low-latency capture and encoding) can keep the animation snappy.
  • Perplexity/time plot: plot model perplexity or normalized confidence over the transcript; spikes often precede hallucinations.
  • Failure-map timeline: mark chat turns where the response is factually incorrect, contradictory, or breaks conversational coherence. Logging patterns and annotations here benefit from practices in ethical data pipelines — keep audit trails for teacher verification.

Animate to teach causality

Make animations that step token-by-token: when the transformer picks a token, show the bar chart and attention snapshot at that moment. When ELIZA fires a rule, show the matching regex region in the input and the binding keys. This stepwise reveal builds causal intuition. Streaming these snapshots reliably in a classroom parallels some low-latency visualization techniques described in mobile studio and hybrid operations write-ups.

Failure modes: what to look for (and how to provoke them)

Use targeted prompts to elicit and analyze failures. Below are common failure modes with classroom tests and explanations.

ELIZA failure modes

  • Brittleness: rephrase input slightly and the pattern fails. Test: replace a synonym or add punctuation. Explanation: pattern-matching lacks semantic generalization.
  • Forced reflection illusion: ELIZA often turns user input into a reflective question, creating a feeling of understanding without grounding. Test: ask factual questions — ELIZA will steer to reflection.
  • Rule collisions & order sensitivity: different rules may match; the first wins. Test: reorder rules and show changed outputs.

Transformer failure modes (modern ML)

  • Hallucinations: confidently stated but false assertions. Test: ask for verifiable facts (e.g., obscure dates) and check sources. Explanation: learned statistical patterns—not a grounded knowledge base.
  • Inconsistency: model contradicts itself across turns. Test: ask a fact, then later ask about the same fact — response may differ. Track with the failure-map timeline.
  • Overgeneralization & bias: reflects biases in training data. Test: present social scenarios; compare outputs across demographic variables.
  • Exposure bias and sampling artifacts: beam search vs. sampling produces different fluency/creativity tradeoffs. Test: generate with low vs. high temperature and note hallucination frequency.

Quantifying failures: simple metrics for class use

Keep metrics lightweight so students can measure progress. Possible classroom metrics:

  • Hallucination rate: fraction of factual prompts with incorrect answers (verified by teacher/reliable source).
  • Brittleness score: fraction of paraphrases that change ELIZA’s reply radically.
  • Stability index: percent of repeated factual queries that yield consistent answers from the transformer.

Practical advice: prompts, safety, and pedagogy

Use these practical tips to run a safe, meaningful lesson.

  • Pre-vet prompts for sensitive content. Avoid personal data or emotionally triggering scenarios with minors. For policies and access control, review a security checklist before granting tools broad permissions.
  • Start with constrained tasks: factual Q&A and paraphrase tests before open-ended conversation.
  • Demonstrate sampling controls: show temperature k and p adjustments and how they affect hallucination and variability.
  • Use small local models for transparency when possible — these are cheaper and easier to visualize in-class; open-source options are covered in write-ups comparing open-source and proprietary approaches.
  • Encourage evidence-checking: require students to cite sources when a model provides a fact. This builds media literacy and aligns with principles from ethical pipeline design.

Lesson plan (45–60 min) — timeline and learning objectives

  1. 5 min: Warm-up — quick discussion about “what does it mean to understand?”
  2. 10 min: Demo — type the same prompt to ELIZA and the transformer; observe differences.
  3. 15 min: Guided exploration — students run paraphrase and factual probes in small groups; record failures.
  4. 10 min: Visualization deep-dive — inspect attention heatmaps and token charts; interpret key differences.
  5. 5–10 min: Reflection — students write short explanations of why each system failed and propose fixes.

Advanced extensions (for older students and CS tracks)

For learners ready to dig deeper, assign projects that require code and evaluation:

  • Implement a minimal ELIZA engine from pattern lists and compare with a fine-tuned small transformer on the same prompts.
  • Build a diagnostic suite that runs 200 prompts and computes hallucination rate, response length, and perplexity.
  • Experiment with hybrid systems: chain-of-thought or retrieval-augmented generation that grounds transformer outputs with a document store to reduce hallucinations.

Why the hybrid approach matters in 2026

Late 2025 brought wider adoption of retrieval-augmented generation (RAG) and evidence-centered models in education. The hybrid approach pairs ELIZA-style explicit rules or retrieval checks with transformer fluency to get the best of both worlds: explainability and groundedness. Show students a simple hybrid flow:

  1. Detect factual query via rule or classifier.
  2. Run retrieval to get evidence passages.
  3. Condition the transformer on retrieved passages and generate an answer with citations.

This approach reduces hallucinations and teaches students how structure, rules, and data can be combined. When you design retrieval and evidence layers, treat them like any other data pipeline and consult resources on ethical data pipelines to preserve provenance and auditability.

Assessment ideas and rubric

Evaluate student understanding with both conceptual and practical tasks.

  • Concept quiz: explain why ELIZA can appear empathetic without understanding.
  • Practical test: produce paraphrases that break ELIZA and measure brittleness.
  • Project deliverable: a short report that documents 50 prompts, model outputs, failure counts, and suggested fixes.

Real-world examples & classroom anecdotes (experience & expertise)

Teachers who ran earlier versions of this demo reported that middle-school students quickly learned to distrust single-shot assertions and started habitually asking for evidence. In January 2026, an EdSurge piece described similar classroom moments where ELIZA revealed AI's mechanical gaps and helped students form better questions. Use those anecdotes to seed class discussion about why trust and verification matter.

Classroom takeaway: encountering ELIZA’s mirrors and a transformer’s confident errors makes AI limitations tangible — students learn to ask “How do you know that?” instead of “Is it true?”

Common pitfalls when running the demo (and how to avoid them)

  • Overloading the demo with too many visuals. Start with one or two and add depth only after students grasp basics.
  • Choosing a too-large transformer. Use small models for interactivity; otherwise latency kills engagement. Consider low-latency visual streaming approaches in hybrid studio ops write-ups for optimizations.
  • Skipping verification steps. Always have a reliable source for fact checks to label hallucinations accurately.

Future predictions: classroom AI literacy in 2026–2028

Expect these trends to shape similar demos in the next two years:

  • More open-weight, explainable models enabling deeper visualizations in-class without cloud costs.
  • Standardized AI literacy modules in K–12 curricula emphasizing hallucination testing and source verification (policy moves seen in late 2025 support this).
  • Hybrid educational tools that combine rule-based checks (for safety and reliability) with transformer fluency to provide explainable AI tutoring.

Actionable takeaway checklist (for teachers)

  • Set up a side-by-side demo with ELIZA and a small transformer.
  • Create a 50-prompt diagnostic suite (facts, paraphrases, adversarial inputs).
  • Include visualization panels: rule match trace, token probability bars, and attention heatmaps.
  • Measure hallucination and brittleness rates and run a reflection activity.
  • Introduce a simple RAG pipeline to show how grounding reduces hallucinations.

Final thoughts — why this comparison is a powerful teaching tool

Comparing ELIZA and transformer chatbots in a transparent, visual way demystifies both systems. ELIZA teaches rule logic and brittleness; transformers teach probabilistic reasoning and the risk of persuasive falsehoods. Together they give students a balanced, research-grounded understanding of current AI — the best defense against overtrust and misinformation.

Call to action

Ready to run this demo in your classroom? Download the starter kit, lesson plan, and visualization code from our demo repo and try the guided activity with your students. Get the starter materials, sample prompts, and rubric to transform your next lesson into an evidence-first AI lab.

Advertisement

Related Topics

#nlp#demo#classroom
e

equations

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-01-24T10:43:09.894Z