
AI has changed the assessment conversation. In many classrooms, the problem is no longer whether students can use AI, but whether a task can still reveal what they understand when AI is available. That means assessment design needs to move beyond simple bans and towards tasks that foreground reasoning, judgement and process. If your school is already reviewing assessment integrity, you may also find useful context in this wider look at settled practice and assessment integrity.
Why it matters
AI-resilient assessment matters now because many traditional tasks are easy to outsource. A polished essay written at home, a neat set of maths answers with no visible thinking, or a generic research summary can all be produced quickly by a capable tool. The issue is not just cheating. It is validity. If a task can be completed well without the learner doing the core intellectual work, then the assessment may no longer measure what it claims to measure.
This is especially important for teachers who want to preserve both fairness and ambition. Students still need to write well, solve problems, interpret evidence and communicate clearly. But they also need to show how they arrived at their answers. In practice, the strongest response is not panic. It is redesign.
Easy for AI to reproduce
Tasks are easiest for AI to reproduce when they are predictable, generic and product-only. If the prompt asks for a standard explanation, a familiar essay structure or a summary of common content, AI will usually generate something plausible. If there is no requirement to show intermediate thinking, justify choices or respond live to a follow-up question, it becomes hard to tell whether the student did the intellectual heavy lifting.
A useful test is simple: could a student paste the task into a chatbot and receive something that would earn respectable marks with light editing? If the answer is yes, the task needs strengthening. That does not mean making it obscure or overly complicated. It means making disciplinary thinking visible.
Five design principles
The most effective AI-resilient tasks tend to share five features. They ask students to think in real time, preserve evidence of process, defend decisions, work within meaningful constraints and apply subject-specific habits of mind.
Live reasoning matters because it is difficult to outsource spontaneous explanation. A short viva, conference or in-class check-in can reveal whether a student really understands their own work. Process evidence matters because drafts, annotations, planning notes and worked steps show the path, not just the destination. Oral defence matters because students must explain why they chose a method, interpretation or conclusion. Carefully chosen constraints matter because they reduce generic responses; a specific source pack, dataset, text extract or time-limited condition narrows the space for AI-generated vagueness. Finally, disciplinary thinking matters because each subject values different forms of reasoning. Assessment should target those directly.
These principles also connect well with broader classroom uses of AI. For example, if you are exploring voice-based tools for formative work, this article on voice AI in schools offers useful ideas about fluency, accessibility and verbal response.
English
In English, the vulnerable task is often the polished take-home essay. AI is very good at producing competent literary analysis in a familiar structure. The answer is not to abandon extended writing, but to redesign it so that interpretation must be owned and defended.
One effective approach is to split the task into stages. Students might first annotate a short unseen extract in class, then write a focused interpretation using those annotations, and finally complete a brief oral defence with questions such as, “Why did you prioritise this image?” or “What alternative reading did you reject?” The final mark can draw on the written response, the annotations and the oral explanation. This makes the student’s interpretative journey visible.
Another strong option is to use constraints that reward specificity. Ask students to build an argument around one quotation, one structural choice and one contextual tension, rather than inviting a broad essay that AI can fill with generic points. If you want students to engage critically with generated writing itself, this piece on AI poetry critique and remix shows how comparison and critique can sharpen literary judgement.
Maths
In Maths, final answers alone are no longer enough. AI can often provide correct solutions, and sometimes plausible but flawed ones. That makes method, choice and error analysis much more important.
A resilient maths assessment might ask students to solve a problem, compare two possible methods and explain which is more efficient under given conditions. Another might present a worked solution containing a subtle mistake and ask students to diagnose the error, explain why it happened and correct it. These tasks assess mathematical thinking rather than answer retrieval.
You can also build short oral checkpoints into problem-solving. After completing a question, a student might be asked, “Why did you choose substitution instead of elimination?” or “What told you this graph could not represent the situation?” Such questions are brief, but they reveal depth. For more ideas on making explanations and methods central, this mathematics playbook is a helpful companion.
Science
In Science, AI can summarise content well, but it is weaker when students must reason from evidence, evaluate method and connect explanation to actual observations. That is where resilient design begins.
Instead of asking for a generic account of photosynthesis or forces, ask students to interpret a specific dataset, explain an anomaly, evaluate a practical method or justify which conclusion is best supported by the evidence. For instance, students might receive results from an investigation with one inconsistent reading and be asked whether it should be discarded, repeated or retained. Their answer should include both scientific reasoning and methodological judgement.
Practical work also offers rich opportunities. A student can plan an investigation, carry it out, record observations and then explain live why they controlled certain variables. Even where full practical assessment is not possible, scenario-based tasks can still test practical reasoning. If you are thinking about how AI can support, rather than replace, scientific evaluation, this article on advanced AI for science research and evaluation may prompt useful planning ideas.
Humanities
Humanities subjects rely on argument, source use and judgement, all of which can look convincing when generated by AI. The key is to anchor tasks in constrained evidence and require students to make choices they can justify.
A stronger history or geography task might provide a limited source pack and ask students to rank the usefulness of sources for a precise enquiry, then defend that ranking. Another might require students to build an argument using only three selected pieces of evidence, with a short reflection explaining why they left other material out. This rewards discrimination, not just accumulation.
Judgement under constraint is particularly powerful. In a history classroom, for example, students might answer a causation question using one political, one economic and one social factor, then justify their weighting of each. AI can produce a plausible essay, but it cannot easily replace a student’s live defence of why one factor mattered more in this exact evidence set. This history planning article explores source criticism and causation in ways that fit this approach well.
Discover the power of Automated Education by joining out community of educators who are reclaiming their time whilst enriching their classrooms. With our intuitive platform, you can automate administrative tasks, personalise student learning, and engage with your class like never before.
Don’t let administrative tasks overshadow your passion for teaching. Sign up today and transform your educational environment with Automated Education.
🎓 Register for FREE!
Languages
Language teachers have perhaps the clearest route to AI-resilient assessment because spontaneous production and live comprehension remain central. If a task depends entirely on polished written output completed at home, it is vulnerable. If it includes real-time listening, speaking, mediation and response, it becomes much harder to outsource.
A practical redesign might combine a short prepared writing task with an in-class follow-up conversation in which the student must expand, clarify or adapt what they wrote. Another option is mediation: students read or hear a message in one language and explain its meaning, purpose or tone in another. This tests understanding, not memorised output.
Live comprehension also matters. Students can listen to a short unseen passage, answer questions, then explain how they inferred meaning from context, cognates or verb forms. This reveals strategy as well as accuracy. Voice tools can support rehearsal before assessment, but the assessed moment should still capture spontaneous language use and responsive thinking.
Using AI well
AI can still have a place in preparation. In fact, banning it completely may deprive students of useful support. The better question is how to use AI without weakening the assessment.
Students can use AI to generate practice questions, quiz themselves on vocabulary, compare alternative explanations or identify gaps in revision. Teachers can use it to create parallel examples, draft source packs or produce model answers for critique. The important distinction is that AI supports rehearsal and feedback, while the assessment itself is designed to reveal independent understanding. If you are building revision routines around this idea, this workflow for mock-season revision offers a sensible balance between support and integrity.
Quick audit checklist
Before setting your next task, it helps to ask a few blunt questions. Does the assessment capture process as well as product? Will students need to explain choices, not just present outcomes? Is there a live, timed or oral element somewhere in the sequence? Are the constraints specific enough to prevent generic responses? Most importantly, does the task assess the distinctive thinking of the subject?
If the answer to several of these questions is no, the task probably needs redesign. Often the fix is not dramatic. A short viva, an annotated draft, a source-limited response or an error-analysis component can transform a familiar assessment into one that better reflects genuine learning.
Final thoughts
AI-resilient assessment is not a defensive retreat. Done well, it improves assessment quality for everyone. It values reasoning over performance polish, makes learning more visible and keeps subject thinking at the centre. That is good practice whether AI exists or not.
The goal is not to create impossible-to-game tasks. It is to create better ones: assessments in which students must interpret, justify, decide and explain in ways that belong to the discipline. When that happens, AI becomes less of a threat and more of a prompt to assess what truly matters.
May your next assessment reveal the thinking you most want to see.
The Automated Education Team