Cheating on Exams
[Update: This newsletter was written
As the spring 2020 semester got underway, roughly 35% of U.S. college students were enrolled in at least one online class, and over 30 states provided online learning opportunities to K-12 students in public schools. Then the pandemic struck, and online instruction suddenly became the norm, everywhere and at all levels.
The rapid growth of online instruction after February 2020 brought chaos and confusion, and a host of new concerns. One concern (though certainly not the most pressing) is that cheating on exams will be more prevalent in online classes. The reasoning is that taking an exam online, alone in your room, makes it easy to collude with other students, consult forbidden materials, hire “experts”, etc.
This concern may be unfounded. Some studies comparing face-to-face to online formats show no differences in extent of cheating, while a few actually report more cheating in face-to-face classes. This is not to say that cheating isn't a problem in online classes, but simply that online formats may not be exacerbating an existing problem. Either way, it’s a problem worth taking seriously. An often-cited survey of over 70,000 high school students revealed that 95% reported having cheated at least once in school.
Broadly, instructors can deter cheating on exams in two ways: Logistical strategies (which make it more difficult for students to cheat) and/or motivational strategies (which reduce the temptation to cheat). I'll discuss motivation at the end of this newsletter; my focus is on logistics – specifically, how instructors can prevent cheating during online exams.
I chose this topic owing to a study this march that got a lot of attention in the news and in higher education reports. The study, conducted by Mengzhou Li and colleagues at Rensselaer Polytechnic Institute (RPI), focused on preventing collusion (the type of cheating in which students share information during exams, and the type that studies show to be most common in online classes). The RPI study offers guidance on how to address cheating; it also illustrates that the fanciest, most powerful statistics in the world can't solve a problem if the statistics are built on the wrong assumptions.
So, with respect to logistics, what can we do to prevent collusion during online exams? I'll talk about non-statistical approaches first, followed by statistical ones that were incorporated into the RPI study.
1. Observe students.
Watching your students take exams via Zoom, for example, can reduce cheating, but this is just as fallible as watching them in person. (Students are quite good at surreptitious texting, for instance.) Hence the rise of online proctoring services, such as ProctorTrack, Proctorio, and Examity, which offer several advantages over watching students yourself. These services provide live proctors, who observe individual test-takers via webcam. In addition, software can be used to access each test-taker's computer, preventing them from leaving the online testing environment (e.g., to open another browser window), and flagging any behavior that might indicate cheating (e.g., looking away from the screen for more than few moments). As you can guess, online proctoring services are helpful but not infallible. (Surreptitious texting remains possible.) Plus, these services are expensive, and concerns about privacy have been raised, since access is needed to student computers and webcams.
2. Create multiple versions of exams.
This is a common approach for closed-ended question formats such as multiple-choice. Instructors create question banks and then ensure that some or all students take exams differing at least somewhat in content. The assumption is that cheating will be impossible, or have little impact on grades, if each student takes an exam that differs from the exams other students take. This strategy is clearly helpful, but it suffers from two practical limitations:
(a) It only works when multiple questions per topic, or multiple versions of the same question, can be created and still yield exams that are comparable in scope and difficulty. That might be easy to do for 2nd grade math ("9 + 7 = ?" seems comparable to "6 + 8 = ?"), but less so for many other classes.
(b) It requires instructors to generate a lot of questions. Instructors often underestimate how many they need. (Here's how to figure it out: Once you've decided how many questions you want on an exam, and you decide how much overlap you're willing to allow, on average, from exam to exam, the total number of unique questions you'll need can be calculated by squaring the former and dividing by the latter. For example, if you're planning an exam with 40 questions, and you don't want the exams to have more than 4 questions in common, on average, you'll need 40 squared divided by 4 questions. That's 400 questions!)
3. Adjust the exam format.
Online assessment software typically allows you to tweak the exam format so that cheating is more difficult. Examples include presenting each student the questions in a different order, imposing a time limit for answering each question, preventing backtracking to earlier questions, and only allowing students to see one question at a time. These format-level strategies are helpful but not infallible. If students have open lines of communication (e.g., via texting), they can still share information quickly, and these strategies end up only reducing the efficiency of cheating (e.g., some of the texted information won't be useful, because the recipient has already completed a question that required it).
4. Use Computerized Adaptive Testing (CAT).
CAT is based on programs that determine which questions will be presented to individual test-takers on the basis of their responses to prior questions. Successful performance on intermediate-level questions causes the program to present increasingly difficult questions, for example, whereas a test-taker who struggles with intermediate questions might then receive easier ones.
CAT programs are grounded in a psychometric approach (item response theory) that reflects important advances in statistics during the past half century. The main purpose of CAT isn't to prevent cheating, but rather to develop shorter, more precise tests (e.g., by ensuring that each test-taker doesn't have to struggle with too many questions they find impossibly difficult, or waste time on too many they find ridiculously easy). However, CAT also makes it more difficult for students to cheat via collusion, because few if any students will be taking exactly the same test.
Although CAT reduces cheating, it's not infallible, because the more similarly two or more students are performing on a test, the more similar their test content will become. More importantly, CAT just can't be used in many classes, because considerable time and resources are necessary to develop each test. I mention CAT here because it illustrates a statistical success story, and because it's used in the RPI study I'm about to discuss.
5. Use optimized collusion prevention.
"Optimized collusion prevention" (OCP) is the name that the RPI researchers use for their new approach, which is grounded in CAT and a mix of statistical strategies that are highly sophisticated – and, ultimately, almost completely useless, because they're based on unrealistic assumptions. More on that shortly.
OCP is a slightly enhanced combination of strategies 2, 3, and 4 above. The RPI researchers observed that combining these strategies is effective, but not very practical, owing to the need to create such large test banks (strategy 2). However, if you can identify students who are performing at the same level, and you assume that these students wouldn't cheat from each other (because it's only weaker students who cheat from stronger ones), then you can give students performing at the same level the same exams, thereby reducing the number of questions needed for your test bank.
After a great deal of statistical maneuvering, the researchers figured out how to label students who have similar competency levels in such a way that an instructor would only need a test bank that's 1.5 times larger than the actual exam. For example, if you were giving a 30-question exam, you would only need to create 45 questions for the OCP program to draw from.
Using a sample of 78 undergraduates in a computer science class, the RPI researchers identified students' competency levels from performance on a multiple-choice mid-term exam. (The exact details of the competency levels aren't given in the article. This seems like an oversight, but I'll give the researchers a pass because it's the OCP algorithm rather than the instructor that determines level, and one can hope (probably unwisely) that this is done well.) Multiple-choice final exams were then delivered using an OCP program, and the researchers found evidence of reduced collusion from mid-term to final. Not direct evidence (e.g., an anonymous survey asking students whether they cheated or not) but rather statistical evidence (i.e., a reduction in the probable extent of collusion. This is extremely indirect, but here again let's just give the researchers a pass, because there are deeper problems ahead.)
So, is OCP a good countermeasure for online cheating via collusion (at least for multiple choice exams and other closed ended formats)? Heck no. Here are some key problems:
1. OCP requires groupings of students with similar competence. As the researchers put it, "students with similar competencies have small probabilities to cheat within their group due to the fact that they can only obtain tiny collusion gains". I imagine you're already thinking about why this is a deeply flawed assumption:
(a) It's not clear whether an appreciable number of students recognize peers who are at the same competency level.
(b) Even if students do know peers at the same competency level, they might still collude with them anyway. After all, your friend might be better than you on a particular type of question, even though you're generally comparable in skill. A separate issue is that students don't always come to exams prepared to do their best work. If you haven't studied much, or you're slightly ill, you might be happy to reach out to another student who's ordinarily at your level.
2. The OCP statistical model contains a number of problematic, built-in assumptions. For example, the model assumes that...
...cheating is unidirectional. If Chris cheats from Alex, then Alex won't cheat from Chris.
...a cheater only relies on one source. If Jayden cheats from Maria, then Jayden won't cheat from anyone else.
...a cheater won't pass on what they learn from cheating. If George cheats from Amy, then George won't share information with Ash or Zoe.
All three assumptions are untenable. Almost any instructor can describe, from first- or second-hand experience, instances in which one of these assumptions did not hold.
3. You still have to buy the program – when it becomes available – and it's not going to work unless CAT is possible for your exams.
At best, the RPI study merely shows that logistical strategies can reduce cheating under conditions that are never observed in the real world.
There are many studies like this. They remind me of a story (which I just made up) about a scientist who believed that the moon is made of cottage cheese. This particular scientist spent years creating a robotic vehicle that could move across the moon's surface. Drawing on advanced knowledge of robotics, facts about the moon's gravity and atmosphere, and data on the viscosity of cottage cheese at different temperatures, the scientist built his vehicle, ran tests, and then estimated with extraordinary accuracy the maximum speed of this vehicle on the moon. Another scientist checked his calculations and reported that they were correct. Eventually, the entire scientific community agreed: The estimates of maximum speed are perfectly accurate. If the moon is made of cottage cheese...)
If optimized collusion prevention isn't the answer, what else can an instructor do? Here are some motivational strategies that experts have discussed:
1. Remind students of your rules (and your school's policy) concerning academic dishonesty. (Anecdotal evidence suggests that doing so may reduce cheating by making the importance of the rules, and student accountability, more salient.)
2. Be as concrete and clear as possible about your expectations for the course. (Evidence suggests that cheating increases when students are uncertain how to succeed in a class.)
3. Make sure that student grades are reflective of many kinds of work, not just a few high-stakes exams. (Studies suggest that cheating is more likely on stressful, high-stakes assessments.)
4. Create tests that can't be cheated from. Open-ended questions, particularly those that require higher-order thinking (e.g., synthesis and evaluation) reduce the temptation to cheat via collusion, because students know that you'll be grading their exams and noticing extensive similarities in answers. This isn't a perfect strategy, of course, because (a) you might not want to use open-ended questions on your exams, and, even if you do (b) students may cheat anyway, because they assume you're grading too many exams to notice, or because they can't distinguish between overlap that might be expected (e.g., two exams that contain the sentence "Martin Luther King Jr. was born in 1929") versus overlap that's suspicious (e.g., two exams with the sentence "Renowned civil rights leader Martin Luther King Jr. won the Nobel Prize in 1962", which is distinctively phrased and incorrect about the date). Plus, (c) you miss collusion anyway when students share information that's sufficiently broad (e.g., guidance on which formula to apply to an engineering problem) or grounded in factual details (e.g., the correct order of a sequence of historical events). All the same, there are lots of good reasons (including some unrelated to the prevention of cheating) for including at least some open-ended questions on exams.
6. Engage your students. Much has been written about this topic, so I'll be brief. Cheating seems to be less likely in classrooms that function like healthy communities, where students feel respected, empowered to express themselves, and involved in meaningful work. In short, cheating diminishes when the learning experience and the community rather than just the grade are important to students. These are desirable goals for any classroom, and apart from reducing cheating they create a better educational experience for all.
(Remember, if you copy any text from this newsletter, be sure to cite the source! :)
Thanks for reading!