
The allegation that DeepSeek may have trained on Claude outputs will sound, to some teachers, like a row between technology firms. Yet for schools, colleges and trusts, it is a useful warning. When an AI supplier cannot clearly explain how its model was trained, where its data came from, or whether outputs from another system were harvested at scale, the risk does not stay with the vendor. It can flow into procurement decisions, safeguarding reviews, legal exposure and long-term dependency. That is why schools need a stronger buying lens for AI, much as they already do for privacy, security and accessibility. If your team is reviewing AI governance more broadly, our piece on the EU AI Act and school procurement offers a helpful wider frame.
Why this matters
A school rarely buys an AI system because it admires the model architecture. It buys one because someone wants faster report writing, lesson-planning support, revision resources or admin automation. In practice, that means procurement teams often focus on price, features and whether the tool appears safe for pupil data. Those questions matter, but they are incomplete. If a vendor’s core model was built on disputed or poorly documented sources, the school may inherit service instability, legal uncertainty or sudden product changes if the supplier is challenged.
Imagine a department adopting an AI writing assistant for reports. It works well for one term, staff become dependent on it, and then the vendor removes key features while dealing with a licensing dispute. The immediate problem is not abstract intellectual property law. It is missed deadlines, retraining, confused staff and another emergency procurement cycle. We have seen similar dependency concerns emerge in other contexts, including changing access models and pricing pressure in mainstream tools, as discussed in this briefing on AI dependency risk.
Data laundering explained
“Data laundering” sounds dramatic, but the plain-English idea is simple. It describes a situation in which material that may be restricted, copyrighted or contractually protected is transformed or passed through other systems in a way that makes it look cleaner than it really is. In the current AI context, that could mean using outputs generated by one model as training material for another, then claiming the second dataset is merely “synthetic” or “model-generated” without addressing where those outputs originated.
For schools, the key point is not whether every allegation proves true. It is that “synthetic data” or “publicly available outputs” are not magic phrases that remove all concerns. If a supplier says, “We did not scrape protected content directly; we trained on generated outputs,” a sensible follow-up is: generated by whom, under what licence, and at what scale? A provenance problem does not disappear because one extra step has been inserted into the chain.
The missing question
Many school AI purchases still skip the most revealing question: can the vendor explain the lineage of the model and its training data in a way a non-specialist buyer can understand? Schools are often given polished statements about safety, innovation and productivity. They are less often given a clear account of upstream sources, licences, exclusions, restrictions and retraining practices.
Data provenance means being able to trace where training material came from, what rights attached to it, and how the supplier knows that use was lawful and contractually permitted. This is especially important when vendors mix open models, third-party APIs, fine-tuning datasets and customer prompts into one service. The procurement challenge becomes sharper with open-source and self-hosted options too, because “open” does not automatically mean “low risk”. Our article on open-source school software due diligence explores that distinction in more depth.
Output harvesting and licensing
Not all reuse is improper. AI companies can license datasets, buy access to content archives, use openly licensed materials, or negotiate explicit terms for model improvement. They may also train on customer data where contracts clearly permit it, though schools should be cautious here. Legitimate training and licensing involve documented rights, defined scope and a traceable basis for use.
Output harvesting is different when a supplier gathers large volumes of responses from another model, especially if that collection breaches terms of service or sidesteps licensing fees. The practical issue for schools is that a vendor may present these outputs as a neutral dataset, even though they are derivative of another provider’s system and subject to dispute. If the supplier cannot explain the difference, that is a warning sign.
This matters in classroom-facing tools too. A school might see only a neat interface for feedback, report drafting or quiz generation. But beneath that interface may sit a chain of API calls, wrappers and fine-tuned models that no one in the school has mapped. That is one reason it is worth keeping a live inventory of AI tools and data flows, as suggested in our AI privacy audit checklist.
The risk to schools
When vendors cannot explain training data lineage, schools face several practical risks at once. The first is legal uncertainty. Even if a school is not the party accused of infringement, it may still face difficult questions from governors, parents or regulators about due diligence. The second is operational disruption if a tool is withdrawn, restricted or materially changed. The third is reputational harm. Leaders do not want to explain why a widely used pupil-facing tool was adopted without basic scrutiny of where its underlying intelligence came from.
There is also a quality risk. Vendors that are vague about provenance are often vague about evaluation, retention, subcontractors and model updates. Weak answers in one area tend to travel with weak governance elsewhere. If you are already comparing tools for writing support, it helps to look beyond headline features and ask how data protection, logging and audit trails are handled in practice, as we discuss in this comparison of AI assistants for report writing.
Discover the power of Automated Education by joining out community of educators who are reclaiming their time whilst enriching their classrooms. With our intuitive platform, you can automate administrative tasks, personalise student learning, and engage with your class like never before.
Don’t let administrative tasks overshadow your passion for teaching. Sign up today and transform your educational environment with Automated Education.
🎓 Register for FREE!
Ten questions to ask
A good procurement conversation should move from “What can your tool do?” to “What sits behind it?” Schools do not need to interrogate vendors like specialist litigators, but they do need direct questions. Ask which base models are used, whether any third-party outputs were used in training or fine-tuning, and whether the supplier can describe the lawful basis or licence for each major training source. Ask whether customer prompts or outputs are used for model improvement, and whether that is on by default.
Then go further. Ask what evidence the vendor can provide of training data governance, whether there are documented exclusions for protected or sensitive sources, and how disputes about upstream data are handled. Ask what happens if a core model provider changes terms, suspends access or alleges misuse. Ask whether the school will be notified of material model changes. Finally, ask the vendor to explain its red lines in plain language: what data it will never use, what sources it will never ingest, and what contractual restrictions it will not cross.
Contract points worth pushing
Schools should not settle for a friendly assurance in a sales call. If data lineage matters, it needs to appear in writing. A sensible contract should include a warranty that the supplier has the rights needed to provide the service and train or fine-tune any relevant models. It should include indemnities for third-party intellectual property claims, with wording that does not collapse the moment the school configures the tool in an ordinary way. It should also require prompt notification of claims, investigations or material disputes affecting the service.
Audit rights are worth pursuing, even if they are proportionate rather than unlimited. A school may not need direct access to raw datasets, but it can ask for independent assurance reports, documented model cards, subprocessors lists, and evidence of governance reviews. If a vendor refuses any meaningful transparency, that tells you something. Schools can also push for termination rights if there is a serious provenance dispute or an unapproved change in model provider. For leaders drafting or updating policy language this year, this AI policy sprint pack may help translate governance principles into usable clauses.
A red-amber-green view
A practical trust rubric can keep decisions grounded. A green vendor can identify its base models, explain training sources at a high level, confirm licensing or lawful use, state clearly whether customer data is used for training, and offer contractual commitments on notification and indemnity. An amber vendor gives partial answers, relies heavily on broad phrases like “proprietary methods”, or offers transparency only after contract signature. That may still be manageable for low-risk, non-pupil-facing uses, but it should trigger tighter controls.
A red vendor cannot or will not explain lineage, avoid written commitments, change its story across meetings, or dismiss provenance concerns as irrelevant. Another red flag is aggressive discounting that pushes rushed decisions before proper review, a pattern not limited to AI but increasingly visible in software sales. Our article on AI subscription dark patterns and procurement checks shows how commercial pressure can weaken due diligence if teams are not careful.
What to do this term
If your school already uses a potentially high-risk AI tool, do not panic and do not wait for a perfect policy rewrite. Start by identifying where the tool is used, what data goes into it, and whether it is essential or merely convenient. Ask the vendor the provenance questions above and request written replies. Review contract terms for training use, indemnities, notice periods and termination rights. If answers are weak, restrict the tool to low-risk tasks while alternatives are assessed.
It is also wise to separate immediate classroom utility from long-term platform commitment. A tool that helps with brainstorming may still be unsuitable for pupil data, assessment support or whole-school rollout. Procurement is not about banning innovation. It is about knowing what you are buying, what assumptions sit underneath it, and what happens if those assumptions fail. The DeepSeek–Claude allegation is a reminder that the most important AI question in schools is sometimes the oldest procurement question of all: can this supplier show its workings?
Here’s to clearer vendor answers and safer AI buying decisions.
The Automated Education Team