Stop Teaching What AI Already Does: How Classroom Structures are Changing
1. The essay outline a student did not write
A student in my AS Economics class hands in an essay outline. The structure is too clean. The argumentative moves are textbook. The evaluation paragraph weighs three considered trade-offs in the exact form Cambridge wants. I ask the student to defend the outline orally. Why did you start with the supply-side response? Why is your evaluation paragraph pointing this way and not that way? The student freezes. The outline cannot be reproduced in their own words. They know what I am suspecting. I know what they are thinking. Neither of us quite says it.
This is a recurring pattern in my classes now, not a single exchange. The question it raises is not the one I used to ask. The old question was how I catch academic dishonesty. The new question is what I should be actually teaching my students in the age of AI.
I have spent the last few months redesigning my classroom around an answer to the new question. In this article, I want to go over some principles I've arrived at. The next one, Part 2, walks through the practice.
2. The structural asymmetry
AI is everywhere outside class. Students access it on their phones, on their laptops, through their friends' accounts. The classroom is the one zone where I still control the conditions of cognitive work. The terminal exam at the end of the syllabus is closed-book, paper-based, in person. No AI in the room.
The skills the exam tests are exactly the skills AI most readily substitutes for in low-stakes practice. Cambridge calls them AO1, AO2, and AO3. Roughly: AO1 is knowing the content and applying it. AO2 is analysis. AO3 is evaluation. Most subjects with a high-stakes terminal assessment have some version of this layered structure. Lower tiers test recall and basic application. Higher tiers test analysis and judgment.
AI most easily substitutes for student thinking in all these assessment objectives, but this is exactly the work the exam most heavily rewards. With AI, the untrained student walks away with a finished outline, a clean analysis, and a well-shaped evaluation. Yet, their skill (or lack thereof) will not be rewarded by external exams. If they do not actually practice, they do not develop. If they do not develop, the exam catches the gap.
The traditional path from a syllabus to the exam used to run through graded homework. That path is now an honesty-broken signal. The outline a student produces at home tells me almost nothing about what they can produce unaided. So the homework path either gives up its signaling value or moves elsewhere.
3. What the research actually says about AI in learning
In Bastani and colleagues' 2025 PNAS study, about a thousand Turkish high school students were observed across four sessions of math practice. One group practiced with a vanilla GPT-4 interface. A second group practiced with a guardrailed AI tutor that gave hints one step at a time and refused to reveal full solutions. A third group had no AI access. The results split into two stories. With AI access during practice, grades rose 48 percent for the vanilla group and 127 percent for the guardrailed-tutor group. When AI was taken away and the students sat unaided, the vanilla group scored 17 percent worse than the no-AI controls. The guardrailed-tutor group held their own.
The second finding is closely paired. A Harvard physics course built an AI tutor with the same kind of pedagogical scaffolds and tested it against active learning. The guardrailed tutor outperformed. Together the two studies say something simple and consequential. AI tutoring works when it is pedagogically designed. AI used as a frictionless answer machine produces worse unaided performance than no AI at all.
Rozenblit and Keil named the illusion of explanatory depth in 2002. People who read clear explanations rate their understanding much higher than they can actually demonstrate. The illusion gets bigger as the explanation gets clearer. AI is the most frictionless explainer ever built. Its explanations are clearer than textbooks, clearer than my explanations, clearer than anything a student has previously had on demand. That clarity inflates the illusion. The student walks away from the AI session feeling they understand elasticity but they cannot reproduce the diagram unaided, let alone transfer it to new contexts.
Roediger and Karpicke showed in 2006 that retrieval under effort encodes more deeply than re-exposure. Students who tested themselves remembered far more a week later than students who only re-read. Frictionless explanation feels like learning but is not. Retrieval is the learning event.
Older still, Slamecka and Graf showed the generation effect in 1978. You encode more from your own imperfect generation than from reading a perfect one.
The convergent design principle is clean. Design the AI use. Do not ban it. Do not let it default to a frictionless explainer.
4. The principle
In-class becomes an AI-free cognitive gym. It specialises in live, unaided, observable reasoning. The skills that atrophy without practice get exercised in the room, because nowhere else in the student's day still does that.
Out-of-class becomes AI-augmented intake, retrieval, and spaced practice. The at-home AI is deliberately designed to behave like a Socratic tutor: one step at a time, no full solutions, prompts that force generation. The guardrails are the difference between AI that builds the student and AI that hollows them out.
My classroom next year might be the only AI-free hour these kids still get. Everywhere else, AI is the default.
5. The audit: Keep, Discard, Magnify, Modify
The redesign is the result of running every traditional teaching activity through one question. What does this activity still teach in an AI world?
The audit returns one of four verdicts. Keep: the activity teaches what it used to and AI does not change that. Discard: AI now does this better and any class or homework time on it is wasted. Magnify: the activity teaches something AI cannot replicate, and it deserves more focus. Modify: the activity used to teach the right thing but the AI default has broken the signal, so the activity needs structural redesign to recover the cognitive work.
The audit runs across three domains.
Teacher work in class. Direct concept lectures should be discarded. The Bastani finding makes the case obvious: AI delivers a personalised, pace-controlled, infinitely patient explanation at home. Spending classroom minutes on the same explanation is a waste of the only AI-free hour I have. What survives and magnifies is everything live: cold call, narrated worked examples that model expert thinking aloud, real-time diagnosis of what the cohort understands. Formative checks change. They become device-off, with verbal justification, because that is the only way I can see what a student actually knows.
Student work in class. Cold-attempt practice magnifies. Oral defense of one's own work magnifies. Hand-drawn diagrams magnify. The lecture-listening student-role shrinks. Co-building essays modifies into stress-testing each other's evaluations. The unique value of co-building is the live argumentative friction. AI can co-build at home, but it cannot replicate the in-room defense of a position you have to hold under questioning.
Student work out of class. I will try intake reading and concept video with AI assistance, paired with a non-optional retrieval check that I run in class the next morning. The retrieval check is what defeats the fluency illusion. MCQ practice modifies onto a cold-attempt protocol: students may attempt every question unaided first, then bring the AI in as a tutor for the wrong answers only. This is the practical operationalisation of the Bastani guardrailed-tutor finding. Essay outlines as homework should be discarded. Full timed essays and past papers should move into class as the only honest mastery signal.
6. What teaching becomes when lecturing dies
Lecturing was always a bundle of jobs the profession had treated as one. Information transfer. Modeling expert thinking aloud. Sequencing examples. Curating which examples land. Coherence across the term. Motivating. Diagnosing the room in real time. Demonstrating mastery so students see what fluent use looks like.
We mistook the bundle for a single job because they all happened in the same forty-five minutes. Even if AI takes the the task of knowledge transfer, we can still focus on the other jobs.
Unbundle the lecture and the jobs go in different directions. AI takes information transfer. The teacher keeps everything else, and the rest become integrated across the period. Entry slip as a diagnostic. Narrated worked example. Cold call. Debate. Live coaching of an argument under construction. Every task is possible without being in a lecture form.
The new identity for teachers in an AI world is coach, curator, diagnostician, and judge. Coach in the sense that I am running live drills, not delivering content. Curator in the sense that I am choosing what counts as a worthwhile problem. Diagnostician in the sense that I am reading the room and adjusting in real time. Judge in the sense that the live, unaided performance is the only signal I trust now. Teachers did this before, now it will become even more important.
A note on method: this issue was produced through the co-creation workflow I'm advocating. The idea, the angle, the practitioner observations, the curated sources, and the final wording are mine. An AI assistant calibrated to my voice (through a guide of phrases I've approved and rejected) did the research legwork on sources I selected and drafted from an outline we agreed on.
References
- Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakcı, Ö., & Mariman, R. Generative AI without guardrails can harm learning: Evidence from high school mathematics. Proceedings of the National Academy of Sciences. 2025.
- Kestin, G., Miller, K., et al. AI Tutoring Outperforms Active Learning. Harvard Physical Sciences 2 study. 2024/2025.
- Rozenblit, L., & Keil, F. The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive Science 26(5): 521–562. 2002.
- Roediger, H. L., & Karpicke, J. D. Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science 17(3): 249–255. 2006.
- Slamecka, N. J., & Graf, P. The generation effect: Delineation of a phenomenon. Journal of Experimental Psychology: Human Learning and Memory 4(6): 592–604. 1978.
- Bjork, E. L., & Bjork, R. A. Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In Gernsbacher et al. (Eds.), Psychology and the real world. Worth Publishers. 2011. (Previously cited; see the earlier AI in Schools issue on student thinking and AI.)
Member discussion