Cheating on Online Exams
Long before online cheating was possible, there was something you might call "onsole" cheating.
I know this because I attempted it once.
In 9th grade, before taking my first chemistry test, I wrote detailed notes on the sole of one of my shoes. The idea was that during the test, I'd cross my leg, twist my foot upward slightly, and periodically glance down at the notes.
At the time this seemed like a great idea. But I had written the notes on my shoe before walking to school, and chemistry wasn't until 4th period. Once the test got underway, I glanced down at my sole and immediately realized two things: All my notes had worn away, and I was going to fail the test. (Indeed, I failed spectacularly. 31 out of 100.)
A reasonable assumption?
As the Fall 2023 semester gets underway, educators aren't worrying much about students' shoes. AI chatbots like ChatGPT, widely considered threats to academic integrity, are spurring college instructors to rethink how they teach their courses, and some instructors are shifting to traditional, in-class paper exams.
Long before the rise of the chatbots, numerous studies showed that both faculty and students assume that cheating is more prevalent during exams administered online.
This seems like a reasonable assumption. But in this newsletter I'll be discussing a new review and a new study that raise some doubts.
Neither of these papers indicates more cheating during online as opposed to in-person exams. Both suggest that we know less than we think about the prevalence of cheating, while the new study also shows how clever statistics can shed light on this inherently murky issue.
Some background
Although most of us believe that cheating is more common during online exams than traditional ones, evidence on the actual prevalence is mixed. Some studies do find more cheating during online exams, while others show no differences between the two formats.
Why the discrepant findings? Some of the reasons are illustrated by the new review that appeared in the Journal of Academic Ethics this August 4.
A new review
This review, co-authored by Drs. Philip Newton and Keioni Essex at Swansea University, looked at the prevalence of cheating on online exams from 2012 through 2022. Included in the review were 19 studies comprising a total of 4,672 undergraduate participants.
The methods used to create the review were rigorous. Only studies asking students directly and anonymously whether they'd ever cheated on an online exam in college were included. At the same time, studies were excluded if published in "predatory journals". (These journals solicit publications aggressively, charge author fees, and fail to maintain acceptable review and editorial practices.)
The main finding was that 44.7% of students acknowledged cheating in some way on exams administered online. (The pre-COVID percentage was 29.9% but spiked to 54.7% during the pandemic.)
Interpreting the review data
It's always interesting to hear how people react to statistics like this. At one extreme are folks who intuit signs of widespread moral corruption. Others are like, 44.7%? Surely it's way more than that...
Notice that the 44.7% is only an estimate of how many students have cheated at least once on an online exam while in college. This gives us pretty limited insight into prevalence, because some students only cheat once in their lives, others cheat dozens of times a month, etc. (At the same time, instructors sometimes assign online work that blurs the distinction between assignments and exams.) So it's hard to translate a statistic like that 44.7% into, say, an estimate of how much cheating would have taken place at a particular institution during a particular semester.
In studies on in-person exams, the percentages of students who acknowledge cheating vary widely. When estimates are lower than 44.7%, they're rarely more than about 10% lower. In short, this review doesn't provide much support for the assumption that cheating is especially prevalent for online exam formats.
More importantly, Newton and Essex emphasize the methodological limitations of the studies they reviewed. These limitations call for some humility in what we think we know about the prevalence of cheating on any sort of exam. Here is a slightly expanded version of what the researchers say:
1. Samples in cheating studies are inherently limited.
If you invite 10,000 randomly chosen undergraduates to complete a survey on cheating, and 9,999 agree to do so, then sure, you're positioned to make strong generalizations about academic integrity among undergraduates. However, the sampling methods used in cheating studies haven't attained this level of rigor.
For instance, none of the 19 studies that Newton and Essex reviewed used samples that were representative of larger student populations. Rather, the researchers surveyed whoever was willing to participate. And, as you might expect, not everyone who was invited to participate actually did so. (Across studies that reported response rates, 55.6% of invited students actually completed the surveys. We can't know the prevalence of cheating among those who declined to participate.)
2. Cheating is inherently difficult to measure.
One way to measure cheating is through direct observation. This method is unreliable, owing to false negatives (a student cheats, but the researcher doesn't spot it) as well as false positives (a student behaves suspiciously but isn't actually cheating).
Another approach, also pretty direct, is to simply ask students whether they've cheated in the past. But not all people who cheat will admit to it, a perennial problem in any research on sensitive topics. (The recognition that some cheaters don't acknowledge cheating, combined with the substantial percentage of students who choose not to participate in these surveys, leads researchers to assume that the prevalence of cheating is generally underestimated.)
A third approach is to "trap" students by surreptitiously monitoring their exam behavior (e.g., checking on internet activity). This approach is ethically problematic and, for logistical reasons, not as reliable as it sounds (a student may complete an online exam without consulting internet sources, yet cheat by glancing at his phone – or at the sole of his shoe). In addition, this method doesn't support comparisons between cheating during online vs. in-person exams, because the "traps" won't be identical in each case.
Cheating on exams can also be measured indirectly. A traditional approach is to assume that cheating has occurred when students obtain relatively high grades after taking a relatively long time to complete a test. Approaches like this are obviously unreliable.
This brings me to the new study, which relies on an indirect approach that's more sophisticated than the one I just described.
A new study
This study was published on July 24th in the influential journal Proceedings of the National Academy of Sciences. The authors, Dr. Jason Chan and Dahwi Ahn at Iowa State University, explored whether unproctored online exams are a meaningful way to assess student learning.
"Meaningful" could mean many things. Chan and Ahn focused on whether online and in-person exams yield similar results.
Of course, you could question (as many do) whether in-person exams are the best way to assess learning. But given how widely they're used, they do serve as a standard. Online exams can't be considered meaningful unless each student's performance on those exams is comparable to their performance on in-person exams.
Notice that I referred to "each student". Rather than comparing group means, as often done, Chan and Ahn looked at the correlation between individual student performance on in-person exams and the online exams they took during the same course.
How could the researchers do that? Well, thanks to the timing of the pandemic, many students spent the first half of the spring 2020 semester taking exams in class, and the second half of that semester taking exams online. This allowed for what's known as a within-subjects design, in which each student's in-person exam scores during the first half of spring 2020 were correlated with their scores on the online exams they took during the second half.
Specifically, Chan and Ahn obtained exam data for 2,010 undergraduates enrolled in 18 courses during the spring 2020 semester. For each student, in-person exam scores were averaged, online exam scores were averaged, and correlations for these two scores were then calculated for each course. All 18 correlations were positive, and the overall correlation was 0.59. Chan and Ahn concluded that online and in-person exams provide similar results, and that online exams are thus meaningful as assessments. (If you have a statistical background, check the original article for important details on these Fisher's z-transformed correlations.)
Data on cheating
Chan and Ahn also used a clever statistical approach to determine whether cheating appeared to be widespread following the shift to online exams.
Their analyses were based on two key assumptions: (i) Cheating will raise a student's grade, and (ii) cheating is more likely among students who have been performing more poorly in a class.
If the second assumption is correct, then students who did most poorly on their in-person exams during the first half of spring 2020 would be most likely to cheat on online exams during the second half of that semester.
How would you know they cheated? Assuming that cheating raises a student's grades, you'd expect to see consistently high grades on the online exams among students who performed most poorly on the in-person exams. Crudely, if cheating was prevalent among these students. you'd expect the data to look something like this:
The horizontal, x-axis of this graph shows in-person exam scores. The farther right you go along this axis, the higher the scores on the in-person exams.
The y-axis of this graph shows online exam scores. The higher up you go on this axis, the higher the online exam scores.
On the right side of the graph, the diagonal, ascending line shows that beyond a certain point, students with higher scores on the in-person exams also got higher scores on the online exams.
On the left side of the graph, the flat line is suggestive of cheating, because it shows that students with lowest scores on the in-person exams still scored relatively high on the online exams. From one side of that flat line to the other you see improvement in in-person exam scores (x-axis), but no change in online exam scores (y-axis).
Please don't take this graph too literally! Chan and Ahn used one like it to simply hint at the kind of curvilinear pattern one might predict if cheating was prevalent following the shift to online exams. The key point is that if we assume that cheating was prevalent – and that the cheaters were primarily students with the lowest grades on in-person exams – then the neat relationship between in-person and online exam scores you see on the right side of the graph would be disrupted, because the students with low in-person exam scores would score consistently and uncharacteristically high.
Chan and Ahn found no evidence of such a pattern. Instead, what they found is a linear relationship that can be depicted (overly simplistically) like this:
This graph illustrates that students who performed better on the in-person exams tended to perform better on the online exams. Further analyses showed that the pattern depicted here did not change significantly when taking into account exam question format, field of study, level of course, duration of exam, or class size.
Chan and Ahn concluded that cheating wasn't widespread on the online exams – or that if it was, it didn't appreciably alter students' grades. (That's an important distinction. Ultimately, the data show comparable performance on in-person and online exams. Although there's no direct evidence that online exam cheating was widespread, Chan and Ahn's data, like the review I discussed earlier, illustrate uncertainties about the true prevalence of the problem.)
Of course, the actual relationship between in-person and online exam scores is messier than the graph above suggests, even though they're significantly correlated. Poor performance in a class may predict cheating, but so do other variables that have been studied for decades. The correlation that Chan and Ahn identified doesn't seem strong enough to rule out increased online exam cheating stemming from desire for success, fear of failure, lack of integrity, etc.
I reached out to Dr. Chan to ask whether such variables may have played a role but were not detected in their analysis. Although, in retrospect, my question was garbled, his response was coherent:
"I have no doubt that students are more likely to cheat during online exams. In fact, we discussed the possibility that practically everyone cheats, but cheating is perhaps less helpful to test scores than we (students and teachers alike) assume. The rise of ChatGPT might change this, but we won’t know whether that is the case for another couple of years probably."
In sum, although the data show no signs of widespread cheating when the exams went online, what may have happened is that cheating increased without actually benefitting students scores. Implicit in Dr. Chan's comment is one of the themes I've been pushing in this newsletter, which is that we remain somewhat uncertain as to the extent of cheating during online exams.
Conclusion: How to deter online cheating
Writing notes on my shoe turned out to be a waste of time. If I had been more clever (e.g., writing on my shoe in the bathroom immediately before 4th period), I would've been able to cheat and probably gotten away with it.
Perhaps now, with professors preoccupied with AI chatbots and other forms of internet support, a college student who engages in "onsole cheating" would be successful too.
I'm not recommending the practice. My point is that students who are motivated to cheat will find ways of doing so, and there are lots of strategies available to them, including, I discovered this week, countless how-to websites and blogs that offer free advice.
The new review and study don't rule out higher rates of cheating during online exams (Dr. Chan: "I have no doubt that students are more likely to cheat during online exams.") Rather, these papers suggest that the differences may not be dramatic, and that we may know less than we think about the true extent of the problem.
So, regardless of actual prevalence, what can be done to deter cheating on exams administered online?
1. Remind students of the university honor code (or ask students to sign one).
Some studies show that when students sign an honor code, the incidence of cheating diminishes slightly. Studies also suggest that explicit warnings about the consequences of cheating may help. Although these strategies won't solve the problem, they seem desirable because they're quick and cost nothing.
2. Use online proctoring programs.
Online proctoring software, currently a $19 billion global market, can be used to monitor keystrokes and other indicators of computer activity. Some of these programs also access the computer's camera or microphone. These programs may be effective, but they raise concerns that include unconstitutional invasion of privacy, false positives, and racism (via facial recognition software that flags test-takers of color as "suspicious"). A broader concern is whether universities should be creating a culture of surveillance. (Although I dislike these programs, the "culture of surveillance" argument seems weak, in the sense that such a culture exists already in traditional, in-person exams, where the instructor monitors students visually.)
3. Make fuller use of learning management system (LMS) technology.
Instructors can give their classes multiple versions of online exams, impose reasonable time constraints, delay exam score feedback, and take advantage of other functions available through whatever LMS their university uses to make online exams available. These strategies, which require a bit more effort each semester on the instructor's part, make it more challenging for students to share answers.
4. Adjust exam format and content.
Adjusting the format and content of exams is becoming especially critical given the swiftness and, in many cases, accuracy of AI chatbots. At the moment, AI detection programs (such as what Turnitin uses) are not very accurate – both false negatives and false positives abound.
Linking exam questions to specific course content makes it more difficult for the test-taker to obtain useful information from a chatbot unless the student already understands that content. There's much talk about broader strategies that might be helpful, but not much consensus.
This morning I discovered that, at least in some cases, minor weaknesses in ChatGPT's own answers to a question can be exploited to design new questions that it would not answer well.
For instance, I asked ChatGPT why the prevalence of academic cheating might be underestimated by researchers. It gave me a splendid, A+ answer that included everything discussed earlier in this newsletter and more. But, at the end of its answer, ChatGPT stated in passing that "Using a combination of research methods" can improve the accuracy of estimates.
That's a platitude; it may or may not be right. If you don't know the true prevalence of a behavior, then combining methods to gain corroborative data might not necessarily help. For instance, suppose you observe students taking an exam and make note of who you think cheated, and later you ask each student in the class (anonymously) whether they did cheat. It's reasonable to assume that the most gifted cheaters are the ones who will (a) avoid detection during the exam, and (b) claim later that they didn't cheat. In short, combining methods won't have helped you estimate the prevalence of cheating.
Since methods can be combined in more sophisticated ways than I just described, I asked ChatGPT: "Could a combination of research methods help gather more accurate data on the prevalence of academic cheating."
This time I got a D- answer. ChatGPT answered "Certainly", then spit out a lot of verbiage on the general value of combining methods as a means of cross-validating data. Nothing that ChatGPT wrote was particularly wrong. But none of it was useful.
This little exercise gave me hope that we can stay one step ahead of the machines. By honest means, of course.
Thanks for reading!