A.I. presents a new spin to the classic problem of cheating in higher-level education. To counter these challenges, I offer some ideas on how to develop assignments with the caveats presented by these content-generating platforms. As a whole, this should all promote a shift in the fidelity of assignments as well as the role of the instructional leadership teams for any course.

As a baseline, I will first use ChatGPT to cheat on an assignment. Then, I will try to use GPTZero to determine if I had cheated in the first place. I conclude with ways to mitigate the cat-and-mouse game of GPT creation and detection, along with considerations for how the instructional team can use these opportunities to forster professional mentorship and create meaningful assignments.

Let’s begin with our first assignment: generating a Wordle clone in Python.

Generate a four-letter word

Recently I was browsing GitHub and came across a fun Wordle clone, written in rather simple Python. For context, I have never played Wordle. So I did a little research and tried to figure out if this solution was a faithful implementation of the game. (It was.)

While browsing, I encountered a few similar examples, also in Python. One particular implementation caught my eye; not because of the solution, but because the author wrote a working example, had an issue, couldn’t explain the issue, and everyone else rightly pointed out that their code did, actually, work just fine. You see this kind of behavior with undergraduate code, where a well-intended student is learning a langauge and needs some support and maybe some empathy.

But you also see this with undergraduate plagiarism. The latest version of undergraduate plagiarism is the content-generating features of platforms like ChatGPT, which are free and publicly available.

As someone who is well out of school, I wondered: Could someone really use ChatGPT to cheat without gettting caught? To answer this, we can “think like an adversary,” where we try to look for the conditions that can lead a problem to manifest. Then, we can provide recommendations on how to prevent them from manifesting in the first place.

So, let’s assume that the assignment has a simple parameter: “generate a Worlde game.” In ChatGPT, we can do so with the following query:

Write a Python3 Worlde game

You are welcome to reproduce the results on your own. Here were some common patterns in each solution:

No code comments. This included block comments for function definitions.
No empty newlines. This interested me because Python encourages spacing for readability.
Only one space between function calls or calsses. This is interesting because PEP8 calls for two spaces between classes or functions in a file
Most of the solutions used if __name__ == "__main__". This fascinated me because a lot of junior programmers don’t use this.
They all imported the random module and used a remarkably not-random function. We might want to encourage our developers to use functions that all system-level randomness (ex, /dev/urandom) instead of simple pseudorandom functions.
None of the solutions were object-oriented unless explicitly specified. However, ChatGPT’s solution was just a wrapper for the functions used in its non-object-oriented version.

In sum, they were all remarkably similar to other solutions found online. But, let’s assume good intentions—for now.

Next, go to GPTZero and paste a few of the generated solutions. In every case, I got a 51% chance that the content was, in fact, generated by a GPT engine. This surprised me at first, since I had hoped for, well, a much higher score from code that I had just generated for this exact purpose.

Granted, most of these “GPT detectors” are written with prose in mind. You are welcome to try a similar exercise with a more traditional type of paper or what have you. The point is that these results don’t hold up as well against code, where the rules are well-defined. You could probably engineer something that could detect code plagiarism, but those are not nearly as available for the everyday, underfunded professor.

Let’s chew on this situation for a minute. If we are generating undergraduate code, it’s probably not that complex anyway; I could imagine this worlde game as a simple, non-summative assessment. Unlike prose, code is understandably more rigid and well-defined by clear, fixed rules. You’re less likely to get the same degree of variance between “Python Worlde game” implementations as you would compared to, say, “Write a novella that feels like Annihilation.” Finally, the best we can do with a free checker is get a result that is way below our expectations.

So, 51% is our magic number.

Cheating the Polygraph

After this, I wanted to know what score it would give for some of these online solutions. I took one of the earlier code examples and ran it through GPTZero.

The AI Scan score was a 36%, “likely to be written by a human with a few A.I. sentences.” The plagiarism score read zero.

I looked back at one solution generated by me, earlier, that looked very similar to this one. The differences were subtle: the A.I. code was missing comments, had no whitespace, and defined the main procedure in global namespace (unscoped).

With this in mind, I went back to the code in question and did the following:

Removed comments and empty newlines
Added only one space between functions and global variables

Run it through GPTZero. The A.I. detection score went up to 49%, “moderately likely” to be written by A.I. Interesting.

I made one final modification. In the code in question, there is a duplicate variable: in global namespace, and in the scope of the main application logic. Most of the GPT examples kept variables in local scopes. So, I removed the global variable. The score raised to 50%.

So, we’re within 1% of the target score. What does that mean?

In my honest view, it doesn’t mean much.

Gotcha?

Suppose your professor takes the same methodology for each and every student in their classes (which they almost certainly do not). Would this be grounds for an Academic Integrity community to take action against a student? Would these numbers be sufficient to identify a student as cheating via ChatGPT?

Honestly, it doesn’t matter to me. In fact, whole dialogue emphasizes a few root problems that run much deeper than using LLM’s to pass your undergraduate degree.

First, undergraduate code is notoriously simple from the outset. If A.I. can generate better code solutions than your students, and that code can even evade plagiarism checkers with minor tweaks, it might be time to update your curriculum. In fact, there’s a growing trend in job application pages where the application page will explicitly state that “academic code” is not an acceptable criteria for employment; we might wonder if this is largely driven by recent trends of students using A.I. to generate simple solutions for simple problems.

Next, the GPTZero tool wasn’t even reliable for code that I had just generated from ChatGPT. A 51% success rate is abysmal. However, it may prove challenging (financially or otherwise) for a university to get their hands on a robust tool that can detect if an LLM is used to generate student code solutions. How can we really hold anyone accountable if the tools we’re using for accountability are intrinsically unreliable? This fact alone is enough to warrant some concern and, frankly, to garner some empathy.

Another major concern here is how academia may react to this. If educators choose to ignore the problem, students will almost certainly continue to generate their code, making just enough changes to evade detection. On the other hand, if the detection mechanisms remain close to useless, educators are powerless to really know if a student has cheated.

Regardless, it’s a lose-lose for students. The only “gotcha” is that someone has just paid for a degree and learned close to nothing new from it. Computer Engineering or Information Technology curricula should never find itself lovably behind the times.

Ideas for A.I. in the classroom

Not long ago, postsecondary courses used to shun cellphones. Educators would often tell you to shut it off or leave it in your car. Now, the cellphone is integral to learning opportunities or is a primary means of communication with your professor and other supporters at the institution.

In the same way, A.I. can be included in the classroom to enhance a student’s understanding of code solutions. Here are some meaningful ideas that educators could begin to explore:

AI Code Reviews. Give everyone in the class a prompt for ChatGPT. Tell them to skim the code and identify any problems they observed. Since each student will generate a somewhat unique solution, their answers should have enough variance to prove that they were, at least, working with unique code. Code reviews can encourage students to identify code-quality or security with the AI code.
AiOps. AI is becoming widely used with DevOps and DevSecOps processes. For example, Github Advanced Security includes AI with their CodeQL and GitHub Copilot tools. The student could explore ways to include AI for better automation. In fact, this could be accomplished in a purely theoretical context, without the need to implement anything concrete; put another way, you should expect that your students will have opportunities to implement AI given the kinds of job openings that currently exist. Examples of an appropriate use case may include an architecture diagram for CI/CD pipelines, or a workflow diagram for where AI can be used in an existing DevOps or DevSecOps process.
Exploit Development with AI. Ask students to use ChatGPT to create an exploit for a problem. (The “problem” could be based on something as simple as a CWE or something more complex like an entire CVE.) Have them test their solution and document their findings: did the exploit work, why did it fail, and so forth. Allow them to identify the key components in the generated solution that may have had more merit than others. Finally, have them revise the generated solution so that it can, in fact, deliver a payload or exploit a vulnerability/weakness as intended. This can overlap with a variety of cases, including network security, application security, and binary exploitation.
AI Threat Detection. Have students research current trends in AI Threat Detection. Use ChatGPT to try and replicate the results; document every failure or shortcoming with using that approach. Then, have them propose a LLM (or an existing LLM) that can overcome the problems. The student should use current research to support their decisions. This task could also address the problem of “deepfake” accounts on platforms suchas LinkedIn.
AI/LLM exploitation. Use well-known anecodotes about people who could “trick A.I.” into leaking information. Have a student try to repdouce those results: why do they think it failed or succeeded, etc. Further, have the student propose ideas for AI exploitation. Content can be supported with current research, which is growing by the day.
Interview your students. For group projects, take a week where the students present their idea to your and your TA’s directly. Ask them strategic, deliberate questions about their ideas or design choices. This should very much resemble a real-life job interview, where the interviewer may ask the candidate to prove that they really have mastery over a topic. Job interviews are usually where students who cheated their way through university get identified and filtered out—often, very quickly.

The big takeaway for all these solutions is that they require students to prove that they understand how the LLM is working. In addition, many of these will lead to unique starting points for the assignment. However, unlike a simple code solution, the student is required to provide a novel soltuion.

Any of these topics could be at least attempted with AI. But those results are generally of an unaccepable quality. Of course, if you find that something like ChatGPT is able to defeat your questions, then it’s time to take your assignment back to the drawing board. In fact, that should be the baseline for whether you created an appropriately challenging question in the first place.

However, these LLMs are still just machines that are defined by rules, which can always be defeated given enough effort (or, in many cases, with a bare-minimum effort). Deliverables like architecture diagrams or in-depth questioning may prove less feasible to generate. Conversely, they may also stand a higher chance of being detected by a traditional plagiarism detector.

Industry leaders are often okay with A.I. tools used, so long as they are used strategically. The strategic use of these platforms entails an understanding of when and why to use it. This will put your students at a much better vantage point when they are applying for a wide range of jobs.

Mindset-shifting

If you’re fixed on all of this as a punitive measure, then you might be missing the point. The goal here is to provide your students with a meaningful experience and a rich education. College is an investment, of time and money, and students deserve to get the most out of their investments. Holding your students to high standards is only meaningful if you also provide high support for them to reach those standards and to foster their goals.

In my view, the best antidote to the AI problem is a question of mindset: did you, as the educator, challenge yourself to generate challenges that cannot be answered by widely-available freeware? And further, did you do due dilligence to hold your students accountable for their own success?

This is where you, as the instructional lead, will need to also hold yourself to the same higher standard which you expect from your students. This may require you ditching the same assignments you’ve given for the past ten years. It may also require you to lean on your instructional aides in a collaborative manner, becoming more of a manager and enabling your team to provide modern, high-fidelity assignments.

You can take this a step further by mirroring the same leadership and collaborative structues in the classroom. Have students circulate the “leader” for a given task or initiative. This will give students an opportunity to take ownership of the quality of their work and to call-out any A.I. work that is carelessly used. Many of your students will develop into lead engineers or managers, where they will be accountable for their subordinates’ success; and you can bet that the strategic use of A.I. platforms will be something they will need to speak to eventually.

Academic institutions should accept that A.I. is here for the foreseeable future; now, more than ever, professors and teaching assistants need true support (not lip-service or empty regulations and policy changes). The best support is likely to be financial, investing in tools that can better detect LLM and AI generated code or text. Institutions can also provide support by recognizing professors who are taking strides in the direction of appropriate changes.

Final thoughts

During my undergraduate experience, the best professors in my program were ones who understood that students may cheat. There is a difference between getting and earning a degree. So, this whole conversation is not some new phenomenon to academia; rather, A.I. is just a new version of it, and one that I don’t find particularly remarkable at all.

Likewise, those same professors were usually the ones who offered challenging assignments and high levels of support, whether it be career advice, networking opportunities, or considerations for my personal or professional goals. They tried to adopt new developments in computer science, information technology, or any of the disciplines which are often groped together as “Cybersecurity.” At the end of the day, these are the things that foster success.

Many undergraduates are fresh out of high school and still learning what it means to be a professional. If you are concerned about A.I. generated content overwhelming your curriculum, try to switch your point of view.

You, as the leader, have an opportunity to mentor a semester’s worth of upcoming professionals. And not just the ones who lead clubs and pay lip-service, or those who think they’ve manipulated the system because they asked an LLM to do their homework; but, really, also the ones who want to do the right thing, the ones who want to be prepared for the ever-changing workforce which they are about to inherit. The ones who work 40 hours around school so they can get their education and also eat every night.

Or the ones who generate A.I. content because they feel no one cares enough to check in the first place.