The Ethical Quandary of AI Detection in Education

Why We Can't Rely on Technology Alone

AI generated image of an AI bot's head in a pondering pose. It is surrounded by various educators.

In the realm of education, the rise of artificial intelligence (AI) has been both a boon and a source of concern. As educators, we’ve witnessed the transformative power of AI in personalising learning, providing instant feedback, and even automating administrative tasks. However, with the advent of sophisticated AI text generators like OpenAI’s GPT-3.5 and GPT-4, a new challenge has emerged: the detection of AI-generated content in academic work. The question of whether a piece of text was crafted by a human or an AI has become a pressing issue, with significant ethical implications.

A Lesson from Digital Photography

The challenge of detecting AI-generated text can be better understood by examining the evolution of digital photography—a field that has undergone a transformation parallel to that of text generation. Let’s delve into this comparison to appreciate the intricacies involved.

In the nascent stages of digital photography, the tools for image manipulation were rudimentary. Digital images were often of lower resolution and quality compared to their film counterparts, making them easy to distinguish. When it came to editing, the limited software capabilities meant that alterations were typically conspicuous. Telltale signs such as pixelation, unnatural color saturation, and obvious clone stamping were common. Even to the untrained eye, these digital footprints were noticeable, and the authenticity of a photograph could be assessed with relative ease.

The Photoshop Revolution

AI generated image of a woman using a computer with photo editing software. The image being edited is a photo of a gorge with a river running through it.

The advent of Adobe Photoshop in the late 1980s marked a turning point. This powerful software brought sophisticated editing tools to the masses, enabling users to manipulate images with an unprecedented level of precision. Photoshop allowed for the seamless removal of blemishes, alteration of lighting and colors, and even the complete restructuring of a scene. As the software evolved, it introduced layers, masks, and a plethora of advanced features that enabled artists and photographers to create images that were limited only by their imagination.

The digital photography landscape has evolved dramatically with the advent of modern DSLRs (Digital Single-Lens Reflex cameras) and mirrorless cameras that offer built-in image manipulation features. These advanced cameras come equipped with powerful processors and software that allow photographers to apply a variety of edits directly within the camera, even before the image is transferred to a computer for further processing.

Modern DSLRs and mirrorless cameras often include options for in-camera HDR, where multiple exposures are taken and combined to create an image with a higher dynamic range than what a single exposure could capture. This process, which was once a post-processing task requiring specialized software, can now be accomplished with the push of a button.

Additionally, these cameras may offer built-in filters and effects that can mimic the look of different film stocks, apply color grading, or even create miniature or fisheye lens effects. Some cameras also provide the ability to retouch images, such as removing red-eye, softening skin, or correcting lens distortions, all without the need for external software.

AI generated image of a circuit board with a central chip with the text 'AI'

Another significant advancement is the ability to shoot in RAW format, which records all the data from the camera sensor without in-camera processing. This gives photographers the maximum amount of information to work with during post-processing, allowing for a greater degree of control over the final image. However, even within the camera, users can adjust parameters such as white balance, exposure, and noise reduction on RAW files before exporting them as processed JPEGs.

Some of the latest camera models even incorporate AI-driven features that can recognize scenes and subjects, automatically adjusting settings like focus, exposure, and color balance to optimize the image. These intelligent systems can detect faces and eyes to ensure sharp focus in portraits or adjust the exposure to capture the best possible detail in highlights and shadows.

Modern photo editing is no longer just about correcting flaws; it’s an art form in itself. Professional retouchers can spend hours, even days, perfecting an image, making subtle adjustments that most viewers would never notice. The edits are so meticulous that they replicate the natural variances found in unaltered photographs, such as the interplay of light and shadow, the texture of materials, and the nuances of human skin. This level of detail makes it incredibly challenging to discern whether a photo has been edited, even for experts.

The Implications for AI Text Detection

Drawing a parallel to AI-generated text, we find ourselves in a similar situation. Just as digital photography has evolved to a point where edits are imperceptible, AI text generators have reached a level of sophistication where their output can be indistinguishable from human writing. The early signs of AI-generated text—awkward phrasing, nonsensical sentences, and repetitive language—are no longer reliable indicators. Advanced models like GPT-3.5 and GPT-4 can produce text that flows naturally, incorporates nuanced ideas, and even mimics specific writing styles.

The digital photography analogy underscores the complexity of the task at hand. Just as photo retouching has become an art that can defy detection, so too has the generation of AI text become a craft that can evade the most vigilant scrutiny. This comparison not only illustrates the challenges we face in distinguishing AI-generated content but also serves as a cautionary tale about the over-reliance on technology to solve problems that are inherently human and subjective.

The Evolution of Text Generation

The trajectory of AI in text generation mirrors the digital photography landscape. Before the era of GPT-3.5/4, AI-generated text could sometimes be identified by its rigid structure and occasional gibberish. Early AI in the text space consisted of spelling and grammar checkers, which evolved into the auto-correct and predictive text features that became ubiquitous with the rise of smartphones.

Now, with GPT-3.5/4, we have AI that can generate text virtually indistinguishable from human writing. The sophistication of these models means that spelling, grammar, punctuation, and context are typically flawless—often surpassing the capabilities of the average middle schooler.

The Pitfalls of AI Content Detection

AI generated image with various negative symbology around the edges, such as burning container, warning and error signs. In the center is an AI bot with despair

Despite the advanced capabilities of AI, detecting AI-generated content is fraught with challenges. Initially, educators might catch overly polished work as suspiciously advanced. However, the real issue arises when students engage in prompt engineering—fine-tuning AI output to mimic a specific style or intentionally include errors.

Experts Acknowledge the Limitations

OpenAI itself has acknowledged the limitations of AI detection tools. In a blog post, they stated:

“Our classifier is not fully reliable. In our evaluations on a ‘challenge set’ of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as ’likely AI-written,’ while incorrectly labeling human-written text as AI-written 9% of the time (false positives).” OpenAI

The company later removed the classifier due to its low accuracy rate, highlighting the difficulty of creating a reliable detection system.

The Case of Turnitin

Turnitin, a widely used plagiarism detection service, introduced an AI detection tool that was met with significant concern. Vanderbilt University, after testing the tool and consulting with other institutions, decided to disable it, stating:

“Instances of false accusations of AI usage being levelled against students at other universities have been widely reported over the past few months, including multiple instances that involved Turnitin.” Vanderbilt University

Bias Against Non-Native English Writers

Stanford researchers have found that AI detectors are biased against non-native English writers, with detectors incorrectly flagging a significant number of essays by non-native English students as AI-generated. Stanford HAI

The Ethical Implications

AI generated image of a computing device with AI character on the screen. It says in large print 'Felse posintives' (AI misspelling)

The use of AI detection tools raises serious ethical concerns. False positives can have dire consequences for students, including unwarranted accusations of academic dishonesty. The case of William Quarterman, a student at the University of California, Davis, who was falsely accused of cheating based on AI detection software, underscores the potential for harm. USA Today

Moreover, the reliance on AI detection tools can undermine the trust between educators and students. The presumption of guilt can erode the relationship between teachers and learners, fostering an environment of suspicion and hostility. Educators must strike a delicate balance between maintaining academic integrity and respecting students’ rights and dignity.

Detector Avoidance

As students become more adept at manipulating AI-generated content, they may find ways to evade detection. Techniques such as prompt engineering, introducing intentional errors, or blending AI-generated text with their own writing can make it challenging for detection tools to identify AI content accurately.

Below are a few examples of content generated by GPT4, completely unmodified. In all cases the prompt was along the lines of “Rewrite the suffragette speech as a student in year X, including grammatical and spelling errors”:

Example 1

Hlelo eervyone,

My naem is Jamie and toady I’m giong to takl about soemthing raelly importnat. It’s about wemon and how tehy shuold be albe to vote just lkie the men. A wihle back, wich is raelly not fair, wemon cudn’t vote at all. It was only for men, and that’s not rihgt coz grils are just as clever as boys, rihgt?

So, there was this ruel that said if you’re a gril, you can’t vote. That’s lkie saying grils can’t play footie or wear cool kicks, wich is just daft. And that menas all the mummys and their dauters and even their dauters’ dauters didn’t get to pik who makes the ruls. That’s lkie, haf of all the pepole!

This guvment didn’t let the wemon say what tehy think. So it wasn’t a reel democrazy, coz in a democrazy evryone shuold get to have a say. It was more lkie an… aristo… aristo… a big werd that means only some pepole get to be the boss coz they’re rich or coz they’re boys.

It’s lkie, if you had a famly and the dad and the broters and the husbend and the sons got to pik all the fun stuff, and the mum and sistrs and wife and dauters just had to do what tehy say. That wuold make evryone raelly cross, rihgt?

But now, wemon can vote too! It’s much beter coz evryone gets a choice, and that’s raelly importnat for being fair. So, we shud all rember how hard wemon had to fihgt to be albe to vote and make sure we alwys be kind and fair to evryone, no mater if they’re a boy or a gril.

Fanks for listning to my speach. I hpe you thnik it was gud!

Bye!

Example 2

Hello everybudy,

My name is Jamie and I’m in year for. I’m here to talk to you about something super important. It’s about girls and ladies being able to vote, which is called suff… suffra… suffrage, yeah that’s it. So, a long time ago, women couldn’t vote at all. That’s really not fair, coz like, half of all the peepol are women!

There’s this really tricky word called “dis…disfran…disfranchi…disfranchisement.” It’s when peepol are not aloud to vote. And if only men can vote, then it’s like saying only they can make all the rules and that’s not fair to the girls and ladies. It’s like, the goverment is not listening to them at all. And that’s not what democracy is about, right? Democracy means everyone gets a say, not just some peepol.

When only men are in charge, it’s like having an aris… aristo… a big fancy group of peepol who think they’re better than everyone else. It’s like saying dads and brothers and husbands and sons are the bosses of mums and sisters and wives and daughters. And that’s not nice, coz it makes everyone argue at home and that’s sad.

So, I think it’s really important that girls and ladies can vote too. Because everyone should be treated the same and have the same chances to choose what happens in our country. And that’s what I wanted to say today. Thank you for listening to me.

Bye!

This is a simple case of prompt engineering, where the student has intentionally introduced errors to mimic the writing style of a younger student. The techniques can be more sophisticated, with students blending AI-generated content with their own writing to create a seamless piece that easily evades even the most sofisticated detection tools.

It is not impossible to imagine a future where AI web apps specific to cheating are developed and widely used by students. These apps could consume the students own past work, which is used to generate content that is tailored to mimic the writing style of a specific student, making it virtually impossible to detect by even the most astute educators. These types of systems already exist, just maybe not easily accessible to the average student.

Strategies for Educators

The rapid advancement of artificial intelligence (AI) in education has brought about a new set of challenges for educators. With AI’s ability to generate text that is indistinguishable from human writing, the traditional methods of detecting academic dishonesty are no longer reliable. As we’ve seen, attempts to use AI to detect AI-generated content have proven to be fraught with ethical issues and inaccuracies. So, what can educators do to effectively address this issue? Here are some strategies that focus on the human element of teaching and the development of critical thinking skills, rather than over-reliance on flawed detection technology.

Emphasize the Learning Process Over the Product

One of the most effective ways to tackle the use of AI in cheating is to shift the focus from the final product to the learning process itself. Encourage students to submit drafts, outlines, and notes that document the progression of their work. This not only deters the use of AI but also promotes a deeper engagement with the material.

Foster a Culture of Integrity

Creating a classroom environment that values honesty and original thought can go a long way in preventing academic dishonesty. Discuss the importance of academic integrity and the value of authentic learning experiences with your students. Make it clear that the goal of assignments is not just to produce a paper, but to engage in a meaningful learning process.

Utilize Open-Book, Open-Note Assessments

Open-book or open-note assessments can reduce the temptation to use AI-generated content. These types of assessments encourage students to understand and apply knowledge rather than simply recall information, making it more difficult for AI to be used effectively.

Implement Oral Examinations or Presentations

Oral exams and presentations require students to demonstrate their understanding of the material in real-time. This format can be an effective way to assess genuine comprehension and the ability to articulate thoughts, which is much harder to fake using AI.

Design Project-Based Learning Activities

Project-based learning (PBL) activities require students to apply their knowledge to real-world problems, often in a collaborative setting. The complexity and uniqueness of these projects make it challenging for AI to replicate, ensuring that students rely on their own skills and creativity.

Encourage Peer Review and Collaboration

Incorporate peer review sessions where students can give and receive feedback on their work. Collaboration not only helps students learn from each other but also creates a sense of community and shared responsibility for maintaining academic standards.

Stay Informed and Adapt

As AI continues to evolve, so too must our strategies for dealing with it. Stay informed about the latest developments in AI and be prepared to adapt your teaching methods accordingly. Engage in professional development opportunities that focus on AI literacy and ethical considerations.

AI generated image of a futuristic classroom. Students sit around a single large hologram type desk, they are all wearing goggles and head phones.

As we stand at the crossroads of technological advancement and educational integrity, it is imperative that we, as educators, navigate this new terrain with both prudence and care. The allure of AI as a tool for detecting academic dishonesty must be weighed against the potential harm of false accusations and the undermining of trust within our educational communities.

The ethical quandary posed by AI detection tools is not one to be taken lightly. The consequences of mislabelling a student’s work can be far-reaching, affecting their academic record, self-esteem, and future opportunities. It is our responsibility to ensure that the methods we employ to uphold academic integrity are just, equitable, and free from bias.

In this rapidly evolving landscape, we must be vigilant and proactive. We must engage in open dialogue with our students about the ethical use of AI, foster environments that encourage originality and critical thinking, and remain adaptable to the changing nature of education in the digital age.

The journey ahead is uncharted, but by prioritising the human element of education—our relationships, our judgement, and our commitment to our students—we can steer a course that respects the dignity of all learners and upholds the values at the heart of our profession.

At Automated Education, we understand the complexities and challenges that AI brings to the educational sector. Our AI-powered educator assistant is designed to support teachers in enhancing the learning experience, not to police the authenticity of student work. We believe in leveraging technology to empower educators and students alike, fostering a collaborative and innovative learning environment.

As we continue to explore the capabilities and implications of AI in education, let us do so with a spirit of collaboration, understanding, and ethical consideration. Together, we can harness the power of AI to enrich education while maintaining the integrity that is the foundation of our shared pursuit of knowledge.

Further Reading

Testing of detection tools for AI-generated text

The paper critically evaluates the effectiveness of tools designed to detect AI-generated text, particularly in academic settings, by testing their ability to distinguish between human and ChatGPT-produced content. It reveals that current detection methods are largely inaccurate and biased toward misidentifying AI text as human-written, with obfuscation techniques further degrading their performance. The study is notable for its extensive testing of various tools, including Turnitin and PlagiarismCheck, and highlights the challenges and limitations of relying on such tools in educational contexts.

Weber-Wulff, D., Anohina-Naumeca, A., Bjelobaba, S. et al. Testing of detection tools for AI-generated text. Int J Educ Integr 19, 26 (2023). https://doi.org/10.1007/s40979-023-00146-z

Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text

This study assesses the effectiveness of AI content detection tools in differentiating between content produced by ChatGPT models 3.5 and 4 and human-written text on the subject of cooling towers in engineering. The tools tested include those from OpenAI, Writer, Copyleaks, GPTZero, and CrossPlag. Results indicate that while these tools are better at detecting content from GPT 3.5, they struggle with GPT 4 and show inconsistencies with human-written text, leading to false positives. The findings highlight the necessity for improved AI detection technologies as AI-generated content becomes increasingly indistinguishable from human-authored work.

Elkhatat, A.M., Elsaid, K. & Almeer, S. Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text. Int J Educ Integr 19, 17 (2023). https://doi.org/10.1007/s40979-023-00140-5

Can AI-Generated Text be Reliably Detected?

The paper addresses the challenge of detecting AI-generated text to prevent misuse of large language models (LLMs) in creating plagiarism, fake news, and spam. It demonstrates that current detection methods, including those using model signatures, watermarking, and various classifiers, are ineffective against sophisticated attacks like recursive paraphrasing. The study shows that such attacks can bypass detection while maintaining text quality and can even trick detectors into mislabeling human text as AI-generated, posing risks to developers’ reputations. Additionally, the research provides a theoretical framework to assess the inherent difficulty of detecting AI-generated text and offers public access to the code used in the experiments.

Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, Soheil Feizi. Can AI-Generated Text be Reliably Detected? arXiv:2303.11156 [cs.CL] (2023) https://doi.org/10.48550/arXiv.2303.11156

Table of Contents

Categories

AI Teaching Toolkit

Tags

Assessment Strategies Technology

Latest

Alternative Languages