18 May 2026 9 min read Leon's Notes

When AI improves student thinking — and when it short-circuits it

A workflow that diagnoses student learning, plus workflow-based assignments are structural answers to holding students accountable in hard productive struggle in the age of AI.

Photo by AbsolutVision / Unsplash

The specimen is this session

I noticed, mid-conversation, that my thinking was getting sharper. I was writing this piece in a back-and-forth with Claude. Claude presented me with perspectives I hadn't seen before. I gave feedback on the points it made. It pushed back. I felt like I was in intellectual discourse with a co-worker. I could not at first say why.

That is the practitioner side for this article. It is N=1. The N is me. I am writing from inside the moment.

I floated three candidate causes for why the work was helping me learn. Maybe it was forcing me to think. Maybe it was forcing me to read someone else's words. Maybe it was exposing me to new research. None of those felt like the load-bearing part. We sparred for a while. Then I landed on an argument I felt comfortable with — the back and forth made writing with an AI take longer than writing on my own. AI wasn't making my workflows more efficient, it helped me zoom in on the valuable parts of my workflow, instead.

By the end of this article, you will have a diagnostic question every student that used AI effectively should pass, a workflow-level understanding of what counts as good AI use and what counts as bad, and an honest map of the roles of AI based on established research. My thinking is also happening in the writing. It'll be like you're reading my mind as it learns and forms an opinion.

What the harm data is saying

RAND surveyed American youth in December 2025 and found AI use for homework had jumped from 48% in May 2025 to 62% by year-end. 67% of students endorsed the claim that the more they used AI for schoolwork, the more it would harm their critical thinking. Up more than ten points in ten months. RAND frames the question for schools as whether AI use is cognitive offload or cognitive augmentation. The survey was funded by the Gates Foundation.

Closer to the audience for this piece, AmCham China interviewed heads of school — Wellington Tianjin, WAB, ISA Wuhan, QSI Chengdu, Optics Valley, others. The leaders are already drawing the cognitive-offload versus cognitive-augmentation line in their own words. This is called metacognitive laziness — students offloading the thought process itself, not just the typing.

We can't ban it (they'll access it outside of class regardless). So the design question is what to do with that fact.

The flip-side question

What if AI use can improve student thinking? Under what conditions? With what mechanism?

That is a question worth asking, and the answer, in my opinion, has three layers.

Layer one — what learning science already knows

Robert and Elizabeth Bjork's work on desirable difficulties is the load-bearing established-research citation. Practice that feels harder in the moment tends to hold up better long-term. Ease while you are learning is a poor predictor of what you can do later. Retrieval practice, spacing, interleaving, varied conditions — these are the moves that build durable learning. Schools need to incentivize desirable difficulties.

Ericsson and colleagues on deliberate practice is the structural counterpart. The practice has to be effortful, at the edge of current ability, with feedback that lets you refine. Deliberate practice is by definition built into task structure, not added on top.

Difficult and deliberate tasks completed over time has always been the key to learning.

Layer two — what we know about AI

Kosmyna and colleagues at MIT ran a study with three groups — writing essays with brain-only, search-engine, ChatGPT. The ChatGPT group showed the lowest neural engagement. 83% could not quote from the essays they had just written. The authors call it cognitive debt: short-term ease, long-term cost. Even though there were only 54 participants, the results might still hint at big implications with AI use in our classrooms. We should design the class so that students participate in high neural engagement tasks.

Fakour and Imani at Frontiers in Education compared ChatGPT against human tutors with 230 university students in Taiwan. Whether AI use improves thinking is conditional upon how it is used. Structured Socratic AI use produced critical-thinking gains when blended with human mentorship and scaffolding. Unstructured AI use produced surface-level engagement. The AI-as-Socratic-partner mechanism is real but contingent on the design around it.

Because there's no effective way to control students' AI use outside the classroom, schools need to build a culture where students take initiative to use AI productively.

Layer three — what I think, after sparring with Claude

What I ended up sending Claude, in the moment at the beginning of this article: the big work is that you disagree, ask questions, and press until I respond to them sufficiently. That is hard work and takes time (and the work that students sometimes dislike). The "you" there is the AI partner. The mechanism is substantive pushback over time.

If Bjork and Ericsson and the Frontiers finding all point at the same thing, and my own session adds one more data point, substantive pushback over time is a plausible candidate mechanism for AI-enabled thinking improvement. It worked for one motivated adult with a custom workflow. I hope it will work for my students as well.

The diagnostic

When we're forced into taking a stand for what we know, we should be able to answer a question.

Can we defend the claims we made and explain why we completed the assignment the way we did, even after the AI use?

Yes — the AI use was thinking-improving.

No — the AI use was thinking-degrading.

This is the diagnostic. The same AI tool used two different ways gives two different answers. As I write, the AI partner can ask me to defend any claim or explain any structural move. I have to answer in my own words. That is the diagnostic in real time.

There is one more thing the diagnostic does. It situates me within my own workflow for learning. It allows me to iterate and improve my own thinking processes to improve my writing workflow. The verification mechanism helps the design become better. If Claude asks me what I know, and I don't know, or I find the writing lackluster, or the research one sided, I can improve the workflow to improve my learning.

The dual challenge under the diagnostic

Constantly running diagnostics present a problem. The work it tests, how students think through the process and why they did what they did, is exactly the work students are most likely to skip in the age of AI — and there are two distinct reasons why.

Time scarcity

My students are genuinely busy: a crippling course load, as many as 7 exams in a week, university applications, sports, extra-curriculars and competitions, external classes, family, the rest. The mechanism I named — pushback over time — is uncomfortable, slow, and effortful. Rationally, when the cheaper route exists for the same outcome, the cheaper route wins. The very thing that makes the mechanism work is what makes it the first thing to go. That is, unless, we change the outcome. In our assignments, what can students learn aside from the subject knowledge? Can we AI-proof our assignments and also teach empathy, or kindness, or integrity, or global citizenship? Can our assignments be interdisciplinary? Have real audiences? Actually tackle real problems?

Traditional task design no longer cuts it

I have to assign holiday homework because we have a low number of teaching days. Out of necessity, I assigned students to learn a whole economics unit on their own and fill in worksheets that rephrased the learning objectives as questions. Some students completed it authentically — I could see the knowledge had landed when class restarted. Others handed in worksheets of similar surface quality but it became clear during assessment that the knowledge had not been retained (or even learned). AI had done the worksheet also to a high quality (no proof aside from poor summative grades).

Two things were happening. The assignment being graded created extrinsic pressure that tipped some students into using AI. In addition, worksheets are a proxy, and cannot represent learning in the age of AI. Had I asked the AI-completers to defend their worksheet answers and explain why they reasoned that way, the gap would have been audible.

The operational answer is already worked out

This piece does not reinvent the operational answer. A prior piece — Values Through Academics — already worked it out: process artifacts earn completion credit; the assessed grade lives at the link between process artifacts and the final product. That change responds to motivation collapse more directly than to time scarcity, because it shifts what students are actually being marked on. It does not solve time scarcity, though. That will continue to be an influential factor in our schools.

On task redesign: A workflow-level list, not a task-level list

The temptation is to write a list of good and bad AI uses by task. Essay good, brainstorm bad. Spell-check fine, summary risky. That list will not survive contact with a classroom because the same task can run three different ways and produce three different cognitive outcomes. Also, AI is moving too fast for a stationary list of do's and don'ts.

Mollick and Mollick name seven approaches in their working paper — AI as tutor, coach, mentor, teammate, tool, simulator, student. This is helpful because they classify AI by role and relationship, not by rules and expectations. Let me borrow the same taxonomy to help us think at the level of how the AI is used, not what it is used for.

Here are the relationships I think students most commonly hold with AI:

AI as Socratic partner. Presses the student. Asks them to defend. Refuses to write for them. Diagnostic passes because the student is doing the thinking.
AI as feedback engine on student work. Mark up the student's own draft. Diagnostic passes when the student has to act on the feedback and critically assesses feedback quality.
AI as offload of mechanical work. Citation formatting, table cleanup, transcription. This is reserving AI to accelerate tasks that don't improve thinking and learning.
AI as explainer. Concept clarification a teacher or textbook would also give. Diagnostic passes when the student rephrases it back or asks follow up questions.
AI as content generator. Writes the essay. Diagnostic fails because the student is not doing the hard thinking.
AI as summariser. Condenses a reading the student should have done. Diagnostic might fail because the AI got to decide what content to omit and what to keep.

Think about which relationship is appropriate for the workflows you assign to your students, and teach them how to manage that relationship.

Same task, three workflows

Take a 1,500-word IB Economics essay on inflation.

Workflow A. The student writes from scratch. AI checks grammar at the end. The student can defend every claim because they made every claim. AI is a feedback engine.

Workflow B. The student outlines. AI researches and drafts the body. The student edits for tone and clarity. The student can describe the essay but cannot defend why a given paragraph took the angle it did. The thinking happened on the AI side. This is AI as a content generator.

Workflow C. The student writes a draft. The AI argues against the thesis, paragraph by paragraph. The student must defend each move or revise. The student now holds the argument better than if they had drafted alone, because the pushback forced them to find the seams. This is AI as Socratic partner.

Same task. Three radically different cognitive outcomes. As teachers, we can no longer afford to just assign tasks to our students. We need to assign and grade entire workflows, and think about the role AI should have in each step of the workflow.

Honest close

What this piece argues, in one sentence: A workflow that diagnoses student learning, plus workflow-based assignments are structural answers to holding students accountable in hard productive struggle in the age of AI. This workflow does not have to intentionally include AI, but perhaps the learning effect could actually be amplified if an AI is used to provide friction in the form of pushback that forces the user to defend themselves, or alter their perspective.

The writing of this piece was me running the diagnostic on myself in real time. You watched the diagnostic working from inside. This article is the output. Whether the same shape can transfer to my students is an open question I'd like to tackle in the future.

A challenge for you to see the (uncomfortable) truth: pick one student in one AI-touched assignment this term. Ask them to defend the claims they made and explain why they did what they did. What did you hear — and what did it tell you?

A note on method: this issue was produced through the co-creation workflow I'm advocating. The idea, the angle, the practitioner observations, the curated sources, and the final wording are mine. An AI assistant calibrated to my voice (through a guide of phrases I've approved and rejected) did the research legwork on sources I selected and drafted from an outline we agreed on.

References

Schwartz, H. L., & Diliberti, M. K. More Students Use AI for Homework, and More Believe It Harms Critical Thinking: Selected Findings from the American Youth Panel. RAND Corporation, RR-A4742-1. 2026.
AmCham China. Learning in the Age of AI: How China's International Schools Are Adapting. AmCham China Quarterly Magazine, Issue 2. 2025.
Bjork, E. L., & Bjork, R. A. Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. UCLA Bjork Learning and Forgetting Lab. 2011.
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363–406. 1993.
Kosmyna, N., et al. Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. MIT Media Lab / arXiv preprint. 2025.
Fakour, H., & Imani, M. Socratic wisdom in the age of AI: a comparative study of ChatGPT and human tutors in enhancing critical thinking skills. Frontiers in Education, 10:1528603. 2025.
Mollick, E. R., & Mollick, L. Assigning AI: Seven Approaches for Students, with Prompts. SSRN / Wharton School Research Paper. 2023.