End-of-Year AI Audit: Evidence Pack

Turn AI pilots into decisions, evidence and next steps

A school leadership team reviewing an AI audit evidence pack

What it is (and isn’t)

An end-of-year AI audit is a school-wide stocktake of every AI pilot, trial, or “quiet adoption” that has happened this year, followed by a disciplined decision: keep, stop, or scale. The outcome is not a glossy innovation report. It is an evidence pack you can put in front of SLT and governors that shows you understand impact, risk, cost, and next steps.

It also isn’t a witch-hunt. The audit should not be used to catch staff out for experimenting, nor to ban tools retrospectively because they feel unfamiliar. Done well, it protects professional judgement by making expectations explicit: what counts as acceptable use, what evidence is “enough”, and what must change before wider rollout.

Ownership matters. A practical model is joint ownership between a senior leader (to make decisions stick), a safeguarding or pastoral lead (to stress-test risk), and whoever holds data protection responsibilities. A digital lead or teaching and learning lead can run the process day to day, but the audit needs authority behind it. If you already run end-of-term reviews, you can align this with your existing cycle; the structure in our term review framework can be adapted into an annual version.

Set up in 45 minutes

Start with a single AI pilot register. Keep it lightweight, but complete enough that someone outside the project can understand what happened and why. In one short meeting, ask each pilot owner to name the tool, the use case, and the cohort. Then capture the data flow and cost, because these are the details that become urgent once you scale.

Your register should read like a map of “where AI touched the school”. For example, a department may have used an AI tool to generate first-draft feedback comments for older pupils, while a pastoral team trialled a chatbot for signposting wellbeing resources. These are very different risk profiles, even if both are labelled “AI”.

At minimum, capture: tool name and version; who used it (staff roles); which pupils were affected (directly or indirectly); what data went in (including whether pupil work was uploaded); what came out (resources, feedback, analytics); where it was stored; and what it cost (including hidden costs such as staff time, licences, and training). If you’ve been building repeatable routines, link each pilot to the workflow it supported; this makes it easier to decide what to standardise next year, as discussed in building AI workflows that stick.

Keep, stop, scale criteria

Your keep/stop/scale decision becomes defensible when it follows agreed criteria. The key is to avoid “we liked it” and replace it with “it met these thresholds, with these safeguards”. You can still allow professional judgement, but it should sit on top of a shared set of questions.

Impact should be defined in the language your school already uses. If the pilot aimed to improve writing quality, you might look for clearer structure, fewer misconceptions, or improved completion rates, not just “better work”. Workload should be specific too: did it reduce planning time, marking time, or admin time, and did it introduce new tasks such as checking AI outputs?

Equity is often where pilots quietly fail. A tool that helps confident pupils move faster may widen gaps if others lack access, vocabulary, or support. Integrity matters as well: if a pilot made it easier for pupils to submit unoriginal work, you may decide to stop it, or keep it only with redesigned tasks and clearer expectations.

Safeguarding and data protection should be non-negotiable gates, not “considerations”. If you cannot explain what data was shared, where it went, and who can access it, you are not ready to scale. Finally, value for money should include opportunity cost: a “cheap” tool that requires constant troubleshooting can be more expensive than a pricier, stable alternative. If you’re reviewing your tool landscape anyway, align the audit with an annual refresh such as AI tools refresh 2025.

Minimum evidence pack

The minimum viable evidence pack is a deliberate compromise: enough evidence to justify decisions, not so much that you spend the last weeks of term doing paperwork. Aim for a small set of artefacts per pilot that cover impact, workload, equity, safeguarding, and cost.

For impact, collect two or three concrete examples rather than broad claims. A department might include a short “before and after” sample of a resource, a small set of moderated pupil outcomes, or a simple rubric comparison across a class. For workload, a quick time estimate is often sufficient: “planning a quiz took 20 minutes instead of 60” is more useful than a long narrative, especially if you can show it was consistent across staff.

For equity, look for participation and access patterns. Did certain groups use the tool less, benefit less, or require more support? A short note from the SENCo or inclusion lead can be powerful evidence here, particularly if it identifies adjustments needed for next year. For safeguarding and data protection, include the tool’s key settings, what logging exists, and whether any incidents occurred. For cost, include licence costs, any device requirements, and the likely scaled cost if rolled out.

What not to collect: do not hoard pupil data “just in case”; do not export large logs without a clear purpose; and do not create new tracking that becomes permanent workload. The evidence pack is about decisions, not surveillance.

Feedback without noise

Stakeholder feedback can easily become a pile of contradictory opinions. Keep it structured and time-bound. A staff pulse can be a five-minute survey with three questions: what did you use, what changed in your workload, and what would you warn others about? Follow up with one short focus conversation per phase or department to capture nuance, especially where staff had to check AI outputs carefully.

Student voice should be tied to the actual use case. If pupils used AI for feedback, ask whether it helped them improve and whether they understood what was expected of them. If AI influenced lesson resources, ask whether materials were clearer and more accessible. Avoid general “do you like AI?” questions; they generate heat, not evidence.

Parent and carer communications are often overlooked until something goes wrong. A short end-of-year note explaining what was trialled, why, and what is changing next year can build trust. Governors typically ask predictable questions: what data was shared, what controls exist, what the school will do if a vendor changes terms, and how academic integrity is protected. Prepare short, plain-English answers and include them in the evidence pack.

Ready to Revolutionise Your Teaching Experience?

Discover the power of Automated Education by joining out community of educators who are reclaiming their time whilst enriching their classrooms. With our intuitive platform, you can automate administrative tasks, personalise student learning, and engage with your class like never before.

Don’t let administrative tasks overshadow your passion for teaching. Sign up today and transform your educational environment with Automated Education.

🎓 Register for FREE!

Risk review

Your risk review should include both today’s reality and tomorrow’s likely changes. Start with privacy and data protection: identify which pilots involved personal data, which relied on third-party processors, and whether data processing agreements or acceptable use terms were in place. Confirm retention periods and deletion routes, because “we can’t delete it” is a red flag.

Then look at vendor changes. Many AI tools update quickly, and features can appear mid-year that alter risk. A tool that was a simple text generator can become “agentic”, meaning it can take actions, connect to external services, or access broader data. Even if you do not enable these features, your audit should note whether they exist, who can switch them on, and how you would know if settings changed.

Logging and access control are your practical safeguards. Can you see who used the tool, when, and for what purpose? Are accounts tied to school identities, or are staff using personal logins? If a member of staff leaves, can you revoke access immediately? These details are unglamorous, but they are exactly what makes scaling safe.

Finally, check policy alignment. If your AI practice has outpaced your written policies, summer is your chance to catch up. Keep an eye on updates and guidance through a living process such as AI policy watch, but translate any changes into school-specific expectations.

Summer action plan

The point of the audit is not merely to judge the past, but to set up the next academic year. A summer action plan works best when it is time-boxed and owned. Think in 30/60/90-day tasks: what must be ready before staff return, what can be built in the first half-term, and what should wait until routines settle.

In the first 30 days, focus on decisions and hygiene. Confirm which tools are kept, stopped, or scaled. Close accounts for stopped pilots, remove integrations, and delete data where appropriate. Draft or update the AI acceptable use expectations for staff and pupils, and prepare a short training note that explains “what good looks like” for the kept tools.

In the 60-day window, concentrate on capability. Plan CPD that is specific to your chosen workflows, not generic “AI awareness”. For example, if you are scaling AI-supported feedback, train staff on prompt discipline, bias checking, and how to keep feedback aligned with your marking principles. Procurement also belongs here: move from ad hoc purchases to approved tools with clear contracts, named admins, and agreed settings.

By 90 days, you should be embedding and evaluating. Build a light monitoring rhythm, such as a half-term check-in using the same register fields. Update policies that touch AI indirectly too, such as assessment integrity, homework expectations, and safeguarding reporting routes. Assign each task an owner and a deadline; without that, a “summer action plan” becomes wishful thinking.

Reporting out

For SLT and governors, produce a one-page summary that sits on top of the evidence pack. It should state: how many pilots you ran; how many are kept/stopped/scaled; the top three benefits you saw; the top three risks you mitigated; and what decisions you need ratified (for example, procurement thresholds or policy updates). The tone should be calm and factual: you are demonstrating control, not selling a trend.

For staff, write a separate “what changes next year” note. This is where you protect morale. Name the tools that are approved, the use cases that are encouraged, and the boundaries that keep everyone safe. Include a short example: “You may use Tool X to draft a quiz, but you must check accuracy and remove any personal data.” Staff do not need the full evidence pack, but they do need clarity, consistency, and a route to ask questions.

May your new academic year start with clear decisions and calmer systems.
The Automated Education Team

Table of Contents

Categories

Education

Tags

AI in Education Administration Strategies

Latest

Alternative Languages