New Evidence of Scientific Fraud
Falsified data in a scientific paper. Nothing particularly ironic in that.
But what if the paper was about preventing dishonesty?
What if not one but two of the authors independently contributed forged data? And, what if each author didn't know what the other was doing, so that they were deceiving each other as well as the rest of us?
I don't think I could make up such a bizarre and richly ironic tale. As it turns out, I don't have to.
In this newsletter I'll be sharing with you a story about scientific fraud that has been evolving for nearly two years, including some new twists that emerged this week.
Statistics play multiple roles in this story. They helped generate the data, and they helped a trio of academic sleuths identify the falsifications.
There's been some snickering about this case – "honesty researcher accused of fraud" – but it has a dark side. Fraudulent data erodes public trust in science, not to mention scientists' trust in each others' work. In this case, both government and private agencies wasted time and money making changes called for by the forged data. Meanwhile, at least two universities and several journals have been forced to divert resources to investigate the fraud.
Most of this newsletter will describe how the falsified data was discovered, and what the fallout has been. (I'll be using terms like "falsified", because we can't always tell whether numbers were invented as opposed to altered.) At the end, I'll briefly discuss strategies for reducing scientific data fraud.
The original study
The story begins with a prominent journal and a team of superstar researchers.
In August 2012, Proceedings of the National Academy of Sciences (PNAS) published a study whose authors include Drs. Francesa Gino and Dan Ariely at the business schools of Harvard and Duke, respectively.
The study addressed a straightforward problem: When people self-report information, they sometimes cheat. In the U.S., hundreds of billions of dollars of revenue are lost each year owing to deceptive statements in tax returns, insurance claims, business expense reports, etc.
One strategy for addressing the problem is to ask people to sign honesty statements ("I promise that the information I am providing is true”). These statements are typically located at the end of the form a person is filling out.
In their 2012 paper, Gino, Ariely, and colleagues reported three experiments on whether these statements would be more effective if located instead at the beginning. For the moment I'll be focusing on Experiment 3, because this is where fraud was first identified.
To carry out this experiment, the researchers collaborated with an actual insurance company. This company routinely sent out renewal policies asking customers to provide, among other things, the current mileage on their vehicles. At the end of the renewal forms, customers sign the statement "I promise that the information I am providing is true."
The researchers persuaded the insurance company to revise half of the forms so that the honesty statement appeared at the beginning. The forms were otherwise the same. Data were then obtained on 13,488 policies.
(Customers were incentivized to underreport odometer readings, because they knew that higher mileages would mean higher premiums.)
The main finding was that people who signed at the beginning reported 10.25% more miles, on average, than people who signed at the end. Apparently, merely relocating the statement to the beginning inspired more honest reporting. (The logic here is that since the revised forms had been randomly chosen, the two groups wouldn't have differed significantly, on average, if everyone reported their miles honestly.)
This study was cited hundreds of times in the academic literature, and a number of agencies modified their forms accordingly. Here we see how science can be useful: Researchers acquire knowledge, and knowledge influences practice.
First evidence of fraud
On August 17, 2021, the blog DataColada publicly aired the first clear evidence of fraud in the 2012 study.
The authors of the blog, professors Uri Simonsohn (Ramon Llull University), Leif Nelson (UC Berkeley), and Joseph Simmons (University of Pennsylvania), were tipped off by an anonymous team, who continued to help the DataColada bloggers with their investigation.
Raw data from Experiment 3 had been previously been made public; DataColada's statistical sleuthing provided the first clues that these data had been falsified.
To understand how how the DataColada team proceeded, it helps to know that statistics rarely prove anything. In essence they're tools for managing uncertainty, and so they best they can do is to allow near certainty about some conclusion. For instance, the chance of tossing a normal coin and getting heads 20 times in a row is less than one in a million. This outcome is possible in theory, but it's so unlikely that if you do get 20 heads in 20 tosses, the first time you try, you can be nearly certain that the coin is abnormal (and you should check whether it's weighted, or whether both sides are heads).
Back to Experiment 3. The researchers examined the number of miles each car had supposedly been driven since the insurance policy had been purchased or most recently renewed. That is, they compared earlier odometer readings to the mileages drivers currently reported. Let's call these Time 1 and Time 2 mileages.
What you'd expect to see is a lot of variety from person to person in how much they drove between Time 1 and Time 2. Prior data suggests a bell curve: Many people are fairly close to average in how much they drive their cars, with fewer people driving especially large or small amounts.
What the DataColada team found instead was unbelievable consistency. The number of people who'd driven 45,000–50,000 miles was about the same as the number who'd driven 40,000–45,000 miles, the number who'd driven 35,000–40,000 miles, and so on all the way down to zero. In theory, this is possible, but the chances are vanishingly small.
A second clue had to do with rounding. When people report their mileage, they often round up or down to get an even number. How do we know when people have done that? Well, think about the last number on your odometer at some random moment in time. Every number from 0 to 9 has an equal chance of appearing there. But studies show that 0 is by far the most commonly reported final digit. Each of the other numbers (1 through 9) are less commonly reported, with no substantial difference between them. Thus, although some of those zeros may be actual observations, most of them just reflect rounding. The same holds for the last two digits (00 is most commonly reported) as well as the last three digits (000 is most common).
These patterns are indeed what the researchers found for Time 1 reports. For instance, when they looked at the last three digits, 000 was reported 10.8% of the time. No other combination of three numbers was reported more than 1% of the time.
However, Time 2 reports seem to have been falsified. For instance, at Time 2, 000 was reported only 0.09% of the time, a percentage that was almost indistinguishable from what was found for other digits (the highest value being 0.12%).
In sum, the statistical evidence strongly suggested data tampering. No proof yet, just near certainty.
Now the DataColada team discovered a couple of smoking guns in the data files. For instance, at Time 1, half the customers had mileage data written in Calibri font, while the other half had mileage written in Cambria.
The DC team demonstrated that the data had been entered in Calibri first, then duplicated in Cambria. Every Cambria entry had a "twin" in Calibri. Whoever did this tried to mask the duplication by adding a random number between 0 and 1,000 to each Calibri mileage. However, all this accomplished was to introduce a relatively tiny amount of noise to the data. Each Cambria mileage still lined up with a Calibri mileage twin. In fact, the similarity between twins was so great overall, the DataColada team could not reproduce it even after one million random pairings of customer data from the dataset. (Along with DataColada's persistence and the statistical tools they used, we can also thank powerful computers for helping suss out the fraud.)
Finally, the Calibri data exhibited the usual pattern of rounding (including an exceptionally high number of mileages ending in 000), but the Cambria data showed no signs of rounding, which further confirms that it had been falsified.
The DC bloggers concluded from these and other observations that in Experiment 3 of the 2012 study, both Time 1 and Time 2 data had been falsified.
Initial fallout
Reaction to DataColada's 2021 blog post was swift. Less than a month later, on September 13, PNAS formally retracted the study, citing DC's post as the primary evidence.
Four of the study authors responded individually to that post. Three of them, including Dan Ariely, agreed that Experiment 3 data was fraudulent. One of the authors, Francseca Gino, described the post as "convincing" with respect to "serious anomalies" in the data but did not explicitly acknowledge fraud.
Dan Ariely also confirmed that he was the only author who'd been in contact with the insurance company and obtained the data file from them that he shared with his co-authors. Thus, as the DC team pointed out, the only possible sources of fabrication could be Dr. Ariely, someone in his lab, or someone at the insurance company. Forensic evidence in the actual data file implicates Dr. Ariely as the person responsible.
Not long after DC's blog post, the premier journal Science published an article drawing together a number of concerns about Dr. Ariely. Among those concerns:
—Metadata from the Excel file that Dr. Ariely provided for the 2012 study suggests that the file had been created three days earlier.
—Dr. Ariely claims that he no longer has the insurance company's original file, and that the people he collaborated with at the company no longer work there.
—Dr. Ariely referred to the 2012 study in a 2008 lecture and a 2009 Harvard Business Review article, some time before the study was presumably conducted.
—Integrity concerns have been raised about other papers by Dr. Ariely. In 2021, the journal Psychological Science appended an expression of concern to one of his 2004 studies because a statistical program called statcheck (sort of analogous to spell check) found important errors that Ariely couldn't explain. And, in a 2010 interview with NPR, Ariely referred to data that apparently does not exist.
To date, Dr. Ariely has not offered a clear explanation for the falsified data in the 2012 study. He agrees that the data are fraudulent, but he hasn't explain what happened. As he told Science, "I wish I had a good story...And I just don't."
Interim summary
So far we have a 2012 study that was retracted last year because everyone, including the authors themselves, recognized that the Experiment 3 data had been falsifed. Statistics helped generate the data and spot the fabrications. Although it's not certain who was responsible for the fraud, circumstantial evidence strongly implicates Dan Ariely (while absolving his co-authors).
That's not the end of the story though.
A shift in focus
In 2021, the DataColada team and their anonymous colleagues began uncovering evidence of fraud in papers published by Dr. Francesca Gino at the Harvard Business School. Like Dan Ariely, Dr. Gino is a superstar researcher in the field of behavioral economics.
Dr. Gino was one of the co-authors of the 2012 study, but, as I mentioned, neither she nor any of the other authors (except Dan Ariely) could've been responsible for the falsified data in Experiment 3. However, it now appears that Gino was separately responsible for fraudulent data in Experiment 1 from the same study.
(You read that right: Different researcher, different experiment, same study.)
The 2012 study...again
Experiment 1 of the 2012 study consisted of data that Dr. Gino obtained herself, several years earlier, and then shared with the team.
This experiment focused on 101 university students and employees. Each individual completed 20 math puzzles with the understanding that they'd receive $1 for each puzzle correctly solved. Next, after discarding the puzzles, they completed a form in which they reported how much they earned from the puzzles. This form contained an honesty statement, in some cases at the beginning of the form, in other cases at the end. (There was also a control condition that we can safely ignore.)
Participants did not write their names on the puzzle sheets. However, unbeknownst to them, a code at the top of each form was linked to one of the numbers on each puzzle sheet. This allowed the researchers to identify which sheet had been submitted by each participant, and thus to know who cheated.
A key finding was that 79% of participants cheated when they signed the honesty statement at the bottom, but only 37% cheated when they signed at the top.
That's pretty dramatic. Signing an honesty statement before rather than after reporting their earnings cut the incidence of cheating by more than half. You can see why the 2012 study was so influential – the same effect was demonstrated in a lab study involving puzzles as well as in a field study involving odometer readings.
More fraud
The DataColada team examined the Excel file from Experiment 1 and noticed that it was sorted by participant ID (1, 2, 3, 4, 5, etc.), except that eight observations were either duplicated (there were two participant #49s) or out of order. The DC team noted that it would be impossible to sort the Excel file by ID and end up with these eight anomalies. The eight rows could've only been altered by hand.
DataColada ran statistical analyses which showed that these eight observations, taken together, provided unusually strong support for the authors' conclusions.
What happened exactly? The DC team examined a calcChain.xml file and determined that in most cases, conditions were switched. For example, a participant in the sign-at-the-bottom condition who didn't cheat was switched to the sign-at-the top condition. This gave the false impression that signing at the top had been a deterrent.
In sum, the numbers themselves may not have been altered. Rather, what got falsified was condition – i.e., the location of the honesty statements that eight participants signed – and this strengthened the results. Among the 2012 study authors, only Dr. Gino carried out or assisted with data collection for this experiment.
Even more fraud
Along with identifying data tampering in Experiment 1 of the 2012 study, the DataColada team has created two additional posts, the most recent on June 23 2023, demonstrating fraud in other work published by Dr. Gino. A fourth post is expected soon. These posts are public versions of the private report previously sent to the Harvard Business School. Once again, statistics helped DataColada spot the fraudulent data.
First, in a June 20 2023 post, the DataColada team examined raw data from a 2015 study co-authored by Gino. The sample consisted of 491 Harvard students. One question posed at the outset was "year in school". Most of the students wrote things like "sophomore", or "4th year", or "2016", or something like that. Those are sensible responses. However, the Excel sheet shows that 20 students wrote "Harvard." That makes no sense at all. One hungover, preoccupied, or intentionally malicious student might describe their year in school as "Harvard", but 20 of them? It's just not believable. Sure enough, when the DC team analyzed responses for these 20 students, the results provided unusually strong support for the researchers' hypotheses.
Finally, in a June 23 2023 post, DataColada discussed the raw data from one of Gino's 2014 studies. The study claimed to show that cheaters are more creative – i.e., people who cheat on a task obtain higher scores on a simple test of creativity.
In the Excel file that the DC team examined, the 178 participants were sorted by cheating behavior. All of the non-cheaters were listed first, followed by the cheaters. Within each group, creativity scores progressed from lower to higher. This is as it should be. However, among the cheaters, 13 observations seemed to be out of order. That is, their creativity scores didn't fall in between lower and higher scores in the list. The DC team noted that there was no way to sort the file to obtain such an ordering.
To illustrate: Among the cheaters, some people with creativity scores of 3 were followed by some people with scores of 4, and then some people with scores of 5. But if one person in the midst of the the 4 group had a score of 18, this score was viewed as being out of place.
If you assume that an out-of-place score had been illegally changed, you might also assume that the original score matched its neighbors. So, scrolling down the data file, if you see 4, 4, 4, 18, 4 in the creativity score column, you could assume that the 18 had originally been a 4.
When the DC team changed the 13 out-of-order scores to presumed values (e.g., changing that 18 to a 4), they found that Gino's findings were no longer significant.
In other words, it appears that 13 creativity scores were changed in order to obtain the desired result.
Fallout
In the fall of 2021, the DataColada team shared a report of their concerns about Dr. Gino with the Harvard Business School. (Their blog posts from the past few weeks are the public version of this report.) The report appears to have had several consequences:
—At some point between May and June 2023, Dr. Gino was placed on administrative leave, and the name of her endowed chair (Tandon Family Professor of Business Administration) disappeared from the Harvard Business School website. As of today, Dr. Gino's CV describes her as presently appointed to the business school, but it also seems to hint that she no longer holds the endowed chair.
—A few days prior to June 17 2023, Harvard asked some number of academic journals to retract three of the four papers discussed in the DataColada report, and to amend the retraction that had already been made to the 2012 study so it's clear that both Experiments 1 and 3 report falsified data.
In short, this seems to be the early stages of a career being derailed, if not destroyed.
(By the way, the Chronicle of Higher Education is being credited by many as touching off inquiries into Dr. Gino's alleged fraud. This is because the Chronicle published an article on June 16, one day before DataColada went public with their first blog post. Actually, the DataColada team had been investigating Dr. Gino since 2021 and communicating privately with select individuals at Harvard and in the academic community. The Chronicle, relying on reports from involved parties, simply broke the story a day earlier.)
The researcher's response
DataColada believes that "perhaps dozens" of Gino's papers contain fake data. Their focus so far has been on four of the most egregious examples. Like other observers, I'm convinced too that the data have been falsified, although I have no way of knowing whether Dr. Gino herself was directly responsible. As with Dr. Ariely, the evidence seems damming.
As far as I know, Dr. Gino has never publicly confirmed or explicitly denied fraud in any of her work. As of today, for instance, this is the entirety of what she says on her LinkedIn page:
"As I continue to evaluate these allegations and assess my options, I am limited into what I can say publicly.
I want to assure you that I take them seriously and they will be addressed.
I am humbled and gratified by all of the outreach from those whom have reached out to check in – your steadfast support means the world to me.
There will be more to come on all of this."
Sadly, there will indeed be more to come. The DataColada team will soon be posting a fourth blog with evidence of fraud in yet another one of Gino's studies. The previous retraction, as well as Harvard's response, tells us that DataColada's work has been fully persuasive and is being treated by the academic community as more than merely well-reasoned allegations.
The broader problem
Data falsification is a problem extends far beyond the work of Drs. Gino and Ariely.
I've written about this topic before, as have many others. The true extent of the problem is unknowable, given that some people get away with cheating and never cop to it. One meta-analysis showed that roughly 2% of scientists admit to having engaged in some sort of data falsification at least once. A larger percentage report having observed falsification, or at least questionable practices, among colleagues. Presumably, the actual percentages of researchers who have seen and/or engaged in falsification is even higher, given that not everyone responds, or responds honestly, to the surveys.
Solutions
How do we reduce scientific data fraud? Much has been said about this, including useful recommendations published yesterday by a working group of research integrity officers, journal editors and publishing staff.
Here I'll briefly touch on three broad suggestions.
1. Manage expectations.
Sadly, the cheaters we catch are often the ones who are least skillful at cheating. As brilliant as the DataColada bloggers have been with data forensics, the clues they picked up on might not have been left by comparably brilliant cheaters. The most realistic goal may be to reduce data tampering rather than wiping it it out altogether. (Hence the importance of replication by multiple, independent research teams before trusting a finding.)
2. Broaden the lens.
There's a gray area between honest reporting of data and outright falsification. In this gray area lie dozens, if not hundreds of practices that are considered wrong but not quite fraudulent. For instance, along with p-hacking (where a researcher keeps on running analyses until favorable ones turn up significant), there's a ton of micro-level fudging, such as choosing criteria for excluding outliers that yield the most favorable results.
In my opinion, gray-area practices and outright fraud are both symptoms of the same problem: excessive incentivization for publishing new, statistically significant findings. This is a problem in academic culture that calls for cultural solutions, such as making promotion and tenure requirements less reliant on publication numbers, as well as increasing editorial openness to the publication of non-significant results. More on this in a future newsletter.
3. Strengthen existing deterrents.
Academic journals and professional organizations are increasingly requiring or at least encouraging researchers to publicly register their study methods and planned analyses in advance of publication, and to make their raw data publicly available. Strategies like these can function as deterrents, although, again, they're not likely to deter everyone. Meanwhile, increasing incentives for peer review (currently there aren't many) might help motivate reviewers to read more carefully and spot anomalies in the data.
Oh, I almost forgot: Evidence is accumulating that the location of an honesty statement (top vs. bottom of a form) does not affect the truthfulness of peoples' responses. Getting people to be more truthful isn't quite so easy. To be direct and honest is not safe, as Shakespeare said – and it may cost you more in taxes and insurance payments too.
Thanks for reading!