Data Falsification II
If you're a parent, you may have experienced this. At age 3, my daughter Cecilia walked out of the kitchen one morning and, seeing me in the hall, immediately looked up at the ceiling.
Me: Did you just eat some of that brownie?
Cecilia: No.
Me: What's that brown stuff all over your shirt?
Cecilia (still looking at the ceiling): I don't know...
I'm often reminded of this story when I hear about scientific fraud, because those who get caught seem, in retrospect, to have misbehaved in spectacularly obvious ways. The evidence of fraud is like chocolate smeared all over their shirts.
This newsletter is about data falsification, a term I use broadly here for fabrication as well as inappropriate modification of data. My focus is on falsification via duplication of numbers, figures, and images. (In an earlier newsletter I address other kinds of falsification.) One thing I find intriguing about duplication is that statistical reasoning often informs the detective work needed to uncover it.
I chose this topic because of a case that made national news again this week. I'll start with this case, branch out to others, then discuss preventive measures.
Cassava Sciences
On July 29 of last year, the pharmaceutical company Cassava Sciences announced that one of their drugs under development, simufilam, had been found to improve cognitive functioning among Alzheimer's patients. This was the first in a series of announcements and reports touting the drug's effectiveness.
The apparent success of simufilam created tremendous buzz in the medical community – and among investors – because approximately 6 million Americans suffer from Alzheimer's disease, but none of the currently approved treatments do more than partly alleviate the cognitive symptoms and delay their inevitable decline. Cassava claimed to have evidence that simufilam was actually reversing the decline. This was big news, and the value of Cassava stock shot up from around $3 per share to well over $100.
Crash and burn
Since Cassava's July announcement, the company has experienced a series of research-misconduct disasters, not all of which pertain to simufilam. The prominent journal PLoS One retracted five articles published by Cassava scientists and their collaborators. Two other journals published "expressions of concern" about three studies published by these researchers. City University of New York launched an investigation (still underway) into the work of a CUNY faculty member who'd been lead author or collaborator on many of the aforementioned studies. Meanwhile, Cassava and its researchers quickly developed a reputation for untrustworthiness among biomedical experts. By the time the simufilam story made the news again this week, the value of the company's stock had tanked (it's under $20 per share as of this morning).
What did the Cassava researchers do wrong?
Freud argued that when people lie in obvious ways, it's because they desire, unconcsciously, to get caught. Guilt over telling the lie, stress from working to maintain it, fear of being discovered...it's all too much. People unconsciously advertise what they've done so that they can be outed and get some relief for their anxiety.
You may or may not agree with Freud, but his theory comes to mind when you discover how obviously deceptive Cassava researchers have been. (At least in retrospect the deceptions seem obvious; I don't mean to downplay the acuity of the experts who spotted them.)
The problem with these studies that crops up, again and again, is evidence that the data were falsified – either by duplication of complex images, or by manipulation of those images.
For example, three of the five PLoS One articles were retracted, in part, because the articles contained multiple images of proteins that were identical in spite of supposedly coming from different samples. This would be like finding snowflakes that are identical. In each case, the researchers claimed that some sort of error was responsible, but PLos One remained unconvinced. Identical snowflakes are highly suspicious, especially when there's other evidence – the manipulated images I mentioned – that falsification has occurred.
Duplication per se doesn't prove deceptive intent. Last week, for example, I noted that the actual number of wildfires in the U.S. during the early 20th century was exaggerated because individual reporting agencies didn't realize and/or didn't care that other agencies were sometimes reporting the same fires to the federal government. That's not deception; it's flawed methodology.
In contrast, the Cassava researchers treated duplicated images as if they had been obtained from separate sources, and the PLoS One editors, prompted by independent experts, acknowledged that this redundancy would've been physically impossible.
At this point, you may be wondering: What does all this have to do with statistics? Isn't spotting the deception just a matter of looking at pairs of images and noticing impossibly high degrees of similarity? Well, yes, but what makes those redundancies physically impossible is a statistical assumption. Two snowflakes could look exactly alike, in theory, just as two protein samples could look exactly alike. The laws of the universe don't say this is impossible. But the range of possible variation in these things is virtually infinite. Statistically speaking, the chances of two identical-looking snowflakes or protein samples are so fantastically small as to be indistinguishable from zero. And, there's more:
It turns out that in the Cassava studies, there was redundancy across multiple pairs of images. This would be like finding one pair of identical snowflakes, then another pair of identical snowflakes, then another, etc. These multiple duplications, plus evidence that a number of images had been doctored, were too much for the journal editors (and the rest of the scientific community), and now pretty much everyone but Cassava leadership and the researchers themselves have concluded that data were falsified.
What next for Cassava?
Bankruptcy, I'm guessing. Evidence of data falsification among Cassava researchers keeps making national news. Existing "expressions of concern" by academic journals may turn into additional retractions pending the results of the CUNY investigation. Meanwhile, the SEC has launched a formal investigation into the possibility that Cassava falsified simufilam-related data.
Cassava's CEO, Remi Barbier, denies any wrongdoing. For example, he recently told the Wall Street Journal "There is zero evidence, zero credible evidence, zero proof that I’ve ever engaged in, nor anyone I know, has ever engaged in funny business." In other words, Cassava-affiliated scientists happen to be both incredibly lucky (they keep finding identical snowflakes) and incredibly unlucky (software glitches keep altering the images they publish).
I'm not sympathetic to Mr. Barbier's plight. Alzheimer's is a terrible disease, and, apart from the overwhelming evidence of data falsification, experts in the field doubt Cassava's claims about simufilam on general princple. The mechanisms by which the drug supposedly operates don't make sense to them, and they can't imagine how such a drug could reverse (as opposed to merely delaying) the cognitive symptoms of the disease. In other words, Cassava doesn't seem to be just fibbing a little to make a good drug look better. They seem to be hoping to market snake oil.
Is Cassava unique?
Not at all. There's a long history of duplication techniques being used as a method of falsification. (Plagiarism, as well publishing the same data more than once, also constitute forms of duplication, but my focus here is on data.) In the Cassava research I've discussed, images were duplicated. In other studies, it's graphs, or even just parts of graphs, that are inappropriately duplicated.
Duplication of numbers rather than images has also been spotted. In some cases, an entire dataset is reused, and the researcher claims, falsely, that the reused data came from a different sample. In other cases, parts of a dataset are duplicated in order to increase sample size and/or obtain desired statistical outcomes.
Duplication of the latter sort might not be detectable if it's carried out on a small scale. For example, a researcher with IQ data for 99 actual participants might duplicate one participant's data in order to obtain a sample of exactly 100. And, even if sample stats were reported with some granularity, it wouldn't be unusual for, say, two males in the sample between ages 45 and 60 to have IQ scores of 103. As the extent of duplication increases, so does the suspiciousness of the data. For example, the Norwegian researcher Jon Sudbø admitted to multiple instances of data fabrication around the turn of the 21st century, including one Lancet article, since retracted, in which he invented an entire dataset for 908 participants. How do we know he invented that data (apart from his confession)? Well, for one thing, 250 of those 908 people had exactly the same birthday. The exact probability of this happening by chance (as opposed to copying and pasting data in a spreadsheet) depends on assumptions about who the potential participants could've been, but I estimate the chances to be less than one quintillionth of a quintillion. The number is so small I'm not even sure how to say it in plain English.
The Burt affair
Duplication also occurs at the level of statistical results as opposed to raw data. A classic example can be seen in the case of Sir Cyril Burt, a prominent British psychologist who studied, among other things, the heritability of intelligence. In a series of articles, published mainly in the late 1940s and 1950s, Burt seems to have falsified data through various methods, including duplication of correlation coefficients for IQs of identical twins raised apart.
As you can imagine, it's not easy to find identical twins who have been raised separately, and so, after publishing his first study on the topic, Burt published newer versions of the study after adding more twins to the sample. However, the correlation coefficients he reported were the same in each report (0.77). The chances of these coefficients being the same as new data were added, again and again, are effectively zero.
Like many others, Burt was accused of falsification on the basis of a diversity of evidence (e.g., skepticism that he could've found as many separately-raised identical twins as he did, as well as indications that he invented fictitious research assistants). But one of the strongest clues is that redundancy: He just kept repeating coefficients that, realistically speaking, couldn't have been the same each time.
How bad is the duplication problem?
We'll never know the true extent of data falsification (defined broadly), because some researchers presumably get away with it. There are signs that it's prevalent. For one thing, roughly 2% of scientists admit to having falsified data at least once in their careers (leaving one to wonder how many others have done so but don't admit it). As for duplication in particular, a 2016 study led by Elisabeth Bik looked at over 20,000 biomedical research articles and found that 3.8% of them contained inappropriately duplicated images. As the authors point out, their findings likely underestimate the full extent of the problem, since they only examined image duplications within the same article, and they didn't consider other kinds of duplications.
Another clue about the prevalence of falsification is that a lot of peer-reviewed articles get retracted. What constitutes "a lot" is a matter of opinion, but judging from the Retraction Watch database, I'd estimate that an average of about one article per day has been retracted from some peer-reviewed journal over the past seven years owing to falsification.
Even when duplication and other forms of falsification are revealed, the problem may linger. Retraction only negates the impact of a study among scientists; the public may not be aware it has been retracted. Some anti-vaxxers who cite Andrew Wakefield's research seem to be unaware that all of the relevant studies were retracted. Oh, and remember the study showing that people who eat meat are more selfish than vegetarians? Remember the study showing that disorderly environments (e.g., garbage on the streets) promote discrimination? I know two people who recall hearing about these studies, but neither of them realize that both studies have since been retracted. The author, Diederik Stapel, was a famous social psychologist who has now had 58 articles retracted for various kinds of falsfication, duplication being one strategy among the many he used.
What can we do about the duplication problem?
Duplication and other methods of falsification are currently being addressed on three fronts.
First, data sharing is a growing trend – researchers are voluntarily making raw data available to colleagues, and journals are requiring that raw data be made available, at least on request. This is a positive trend, because it facilitates new studies, prevents questionable approaches to data analysis, allows readers to identify errors, and discourages fraudulent practices such as falsification.
Second, an increasing number of experts are devoting time to identifying these practices. Earlier I mentioned the Retraction Watch, which, since 2010, has tracked and explains retractions after they've occurred. There's also PubPeer, a website enabling discussion of published research that has contributed to the outing of numerous instances of falsification (including what seems to have happened at Cassava). And then there's Dr. Elisabeth Bik, a "super-spotter of duplicated images" whose work, she notes, has already led to 172 retractions and over 300 errata or corrections (and, importantly, has already influenced screening practices at some journals). Dr. Bik's story is fascinating – you can read about it here.
The third change – more of a call for change at this point – is a cultural one. You're probably familiar with the phrase "publish-or-perish", which, in academia, refers to the ways that incentives such as keeping one's job and getting promoted are linked to scholarly publication. Although not much has changed yet, the academic community has been questioning this linkage more vigorously in recent years, as it leads to shoddy work, piecemeal publications (multiple publications may be viewed more favorably than a single, more comprehensive piece), and, in some cases, fraud.
We can't all be Elisabeth Bik. But we can support her (here, for example), we can call for changes to academic culture and editorial practice, and – perhaps most importantly – when we hear about a study in the news, we can remember that the statistics might not just be flawed, or misinterpreted. They might be fudged. Hopefully experts will continue to spot "fudge" on the shirts of the researchers....