
Claude Opus 4.6 arrives with the sort of promise that always catches educators’ attention: more capable planning, better-structured outputs and smoother workflows across longer tasks. For schools, the real question is not whether the model sounds impressive in a demo. It is whether it can produce planning that a teacher would actually use next week, with minimal revision. That is especially important now that more leaders are exploring coordinated AI workflows, rather than one-off prompts, as discussed in role-based school workflows.
In this test, I used Claude Opus 4.6 with agent teams to build a half-term scheme of work, then exported the output to PowerPoint. The aim was simple: could this workflow generate something coherent enough for a department meeting, polished enough to share, and accurate enough that the editing burden stayed manageable? That is a tougher standard than “looks good at first glance”, and it is much closer to how teachers judge tools in practice.
What seems new
Claude Opus 4.6 appears to add value in two areas that matter for school planning. First, it handles multi-step work more steadily. Instead of treating a scheme of work as one large writing task, it is better able to break the job into planning layers: unit intent, lesson sequence, assessment opportunities and resource suggestions. Secondly, the agent-team approach makes it easier to assign distinct roles. One agent can draft the sequence, another can challenge curriculum fit, and another can prepare a presentation-ready version.
That matters because many school planning problems are not really writing problems. They are coordination problems. A strong model needs to keep purpose, progression and practical classroom constraints in view at the same time. Earlier updates in this space have pointed in that direction, particularly around longer workflows and governance, as noted in this school briefing on Claude Opus 4.5.
The test
The task was to build a six-week half-term scheme of work for a mixed-attainment secondary class. The brief asked for clear lesson objectives, sensible sequencing, retrieval opportunities, formative assessment points and a final assessed outcome. It also required practical details teachers care about: likely misconceptions, approximate timings, low-preparation activity ideas and a PowerPoint export suitable for adaptation.
I judged the result against five teacher-ready criteria. The first was sequence quality: did each lesson build logically on the last? The second was coherence: did individual lessons feel teachable rather than generic? The third was curriculum fit: did the content stay anchored to the stated topic and age range? The fourth was edit load: how much rewriting would a teacher need before use? The fifth was slide readiness: did the PowerPoint output support briefing, teaching or departmental review?
Workflow setup
The workflow mattered almost as much as the model. I briefed the agents separately, with each one holding a narrow responsibility. The sequencing agent built the six-lesson arc. The curriculum-check agent reviewed for drift, repetition and missing prior knowledge. The classroom-practicality agent rewrote activities to reduce preparation burden. Finally, the presentation agent turned the agreed scheme into a slide deck structure.
This setup worked best when the briefing was highly explicit. Vague prompts produced polished but hollow planning. Clear constraints improved the result sharply. For example, specifying “one meaningful retrieval task per lesson” worked better than asking for “good assessment”. Asking for “activities possible with standard classroom resources” cut down unrealistic tasks. Schools already building evaluation habits around AI tools will recognise this pattern from broader readiness work such as one-week evaluation sprints.
Round 1 results
Without heavy intervention, Claude Opus 4.6 produced a scheme that was better than many first-draft AI plans, yet still short of true classroom readiness. The lesson sequence was usually logical. It introduced concepts in a plausible order, revisited key knowledge and built towards an outcome that felt connected to the unit. That alone is not trivial. Many models can generate six neat lesson titles, but fewer can create a sequence in which lesson four clearly depends on lesson two.
The lesson-level writing was more uneven. Some lessons included sharp, teachable moves: a retrieval starter linked to prior content, a modelled example, then guided practice and independent application. Others slipped into stock phrases such as “students explore the topic through discussion” without enough detail to judge timing or quality. In other words, the framework was often strong, but the teaching detail could still feel airy.
Where it was strong
The clearest strength was structure. Claude Opus 4.6 was good at producing a half-term arc that looked like it had been planned by someone who understands progression. It tended to front-load foundational knowledge, then widen into application, before returning to synthesis or assessment. For a head of department trying to create a workable starting point, that is genuinely useful.
It was also strong on reusable planning moves. Across the scheme, it repeatedly suggested routines teachers can actually use: retrieval quizzes, worked examples, hinge questions, short peer critique and exit tickets. None of these are revolutionary, but they are practical. In a busy week, practical beats inventive. This is where AI planning becomes valuable: not by inventing magical pedagogy, but by organising sensible teaching patterns consistently. That aligns with what many schools have found across broader AI adoption trends in what actually changed in school practice.
Where it fell short
The main weakness was vagueness hiding inside confident prose. A lesson might sound coherent while still leaving the teacher to invent the examples, texts or questions that make it work. That hidden clean-up time matters. If an AI-generated lesson saves ten minutes on structure but costs twenty minutes in clarification, the gain disappears.
Curriculum drift was the second issue. Even with a clear brief, some lessons began to broaden the topic too far, as if the model wanted to make the unit more “interesting” by adding loosely related material. Teachers will recognise the danger immediately. A scheme can look rich while quietly losing focus. The curriculum-check agent reduced this, but did not remove it entirely.
There was also a recurring issue with challenge. The model often aimed for broad accessibility, which sounds helpful, but sometimes flattened the academic demand. Extension tasks were occasionally bolted on rather than woven into the main learning. For mixed-attainment classes, that means the scheme still needed a teacher’s judgement to sharpen ambition.
Discover the power of Automated Education by joining out community of educators who are reclaiming their time whilst enriching their classrooms. With our intuitive platform, you can automate administrative tasks, personalise student learning, and engage with your class like never before.
Don’t let administrative tasks overshadow your passion for teaching. Sign up today and transform your educational environment with Automated Education.
🎓 Register for FREE!
PowerPoint export
The PowerPoint export was more useful than a cosmetic extra, but only just. It worked well as a presentation layer for departmental discussion. The slides gave each lesson a clear title, objective, activity sequence and assessment note. For heads of year or subject leaders briefing colleagues, that is a real advantage. A scheme that can move quickly from planning document to meeting deck saves friction.
However, presentation readiness is not the same as teaching readiness. The exported slides were tidy, but they still reflected the strengths and weaknesses of the underlying plan. If the lesson detail was vague, the slide simply displayed that vagueness more neatly. In that sense, PowerPoint export is best understood as an efficiency feature, not a quality feature. It helps to package planning; it does not guarantee better planning.
Editing before use
How much human editing was still needed? In my judgement, enough that no sensible teacher should treat the output as finished, but not so much that the workflow becomes pointless. A solo teacher could probably take the scheme, tighten examples, add subject-specific resources and adapt it for class context in a reasonable amount of time. A department lead could use it as a draft for collaborative refinement. A non-specialist cover teacher, though, would still need more support than the scheme provided.
The edit load was lowest when the brief included strong curriculum boundaries and a clear output model. It rose quickly when the task relied on the model to infer too much. That is why governance and prompting discipline matter as much as model quality. Schools considering agent-based planning should pair experimentation with policies on data use, review expectations and accountability, much like the guidance explored in AI policy sprint packs for schools and school safety and evaluation checklists.
Best-fit use cases
The best fit is not “press a button and get next half term done”. It is more modest and more realistic. Departments can use Claude Opus 4.6 to generate a first-pass sequence for shared review. Heads of year can use it to coordinate cross-curricular themes or assembly series. Solo teachers can use it to escape the blank page and create a structured draft faster.
It is less suited to highly specialised planning, exam classes with tight specification demands or contexts where curriculum nuance is everything. In those cases, the model can still help, but mainly as an assistant to an expert, not a replacement for one.
Verdict
So, is Claude Opus 4.6 good enough for half-term planning next week? Yes, as a drafting partner. No, as an unsupervised planner.
That may sound cautious, but it is actually a strong result. The model produced a usable scheme skeleton, a sensible lesson arc and a presentable slide deck with less intervention than earlier tools usually required. The planning was often coherent enough to be worth keeping. Yet the final mile still belonged to the teacher: sharpening examples, correcting drift, raising challenge and matching the scheme to real pupils.
For schools willing to treat AI as a structured collaborator rather than an automatic planner, Claude Opus 4.6 looks genuinely promising. The gains are real when the workflow is disciplined, the criteria are explicit and human review remains non-negotiable.
Here’s to faster planning drafts and fewer blank-page evenings!
The Automated Education Team