The Crisis Wrecking Modern Science
Biases toward exciting new results (and against negative ones) lead to widespread malpractice
In 2011, psychologist Daryl Bem published a paper in The Journal of Personality and Social Psychology that claimed statistically significant proof of extrasensory perception (ESP). A patently ridiculous result had gotten into the professional literature.
Stuart Ritchie, then a psychology graduate student at the University of Edinburgh, thought Bem had to be wrong. He and some colleagues redid Bem’s experiment—conducted a replication study—and indeed got a negative result. They found no evidence for ESP. Bem’s results were most likely a fluke. Ritchie’s group then submitted their replication study to JPSP, which had published Bem. Surely the journal would publish their follow-up so its readers could learn that independent researchers couldn’t reproduce Bem’s ESP results.
JPSP declined to publish. So what if the professional literature on ESP only included Bem’s fluke positive result? Ritchie’s group hadn’t discovered anything new or exciting. All they’d done was provide evidence against a previously published article. JPSP wasn’t in the business of publishing replication studies with negative results.
Ritchie had just discovered how the replication crisis works.
The replication crisis, for those who have not heard of it, refers to the sad fact that a terrifyingly large amount of published scientific research is junk. Psychology studies of “social priming,” marine biology studies of ocean acidification, biomedical studies of cancer drugs; the research can’t be reproduced and therefore has no claim to scientific validity. We don’t know just how much research is junk because scientists haven’t yet tested to see which results hold up. But the junk research is a mountain, there are larger mountains of research built unsuspectingly on the junk, and the result is that each year we waste tens of billions of research dollars.
Ritchie argues that modern science’s flawed incentives produced the replication crisis. Scientists earn tenure, grants, and reputation from publishing research, above all from publishing exciting, new, positive results. Journal editors also prefer to publish exciting new research, so scientists just don’t submit negative results for publication. Some go to the file drawer. Others somehow turn into positive results as researchers, consciously or unconsciously, massage their data and their analyses. The result is massive publication bias, entire scientific literatures skewed by researchers and editors to magnify positive effects or create them out of whole cloth.
Abuse of statistics compounds the replication crisis. Decades ago, many disciplines adopted a default standard of statistical significance to determine which results indicated good evidence of associations, say between smoking and cancer. But scientists began to play fast and loose with their analysis when statistical significance became the requirement for a positive result and hence for publication.
Far too many scientists “p-hack,” that is, run statistical tests until a statistically significant association pops up. Some overfit (produce a model to create a pattern around random data) or HARK, hypothesize after the results are known. That might be acceptable in exploratory research, but HARKing effectively presents tentative exploratory research as if it were rigorously tested confirmatory research. Moreover, many scientists conduct studies with too little data to provide statistical power. Those low-powered studies can’t determine anything reliably.
But p-hacking, HARKing, and underpowered studies guarantee publication, even if the results are surely false positives. Scientists’ career incentives lead them to massive abuse of statistics to produce journal-ready statistical phantasms.
Science also suffers from negligence. Astonishing amounts of research contain simple errors in their numbers. Cancer studies mislabel the cell lines they purport to study, animal studies researchers don’t use proper randomization and blinding. Somehow such negligence usually tips results in the direction of statistical significance. Then groupthink inhibits publication of results that go against disciplinary or political presuppositions. Peer review now serves as much to enforce groupthink as to check for professional value. Ritchie suggests cautiously—too cautiously—that the groupthink of liberal politics may also contribute to the replication crisis.
Deliberate fraudsters also worsen the replication crisis. Some make up data, some reuse images, some invent interviews. Asia produces far too many fraudsters, in an axis running from Japan to South Korea to China to India. Japanese anaesthesiologist Yoshitaka Fujii, reigning world champion of scientific fraud, published 183 papers on the strength of data from made-up drug trials. Yet America and Europe produce fraudsters enough. Diederik Stapel, a social psychologist in the Netherlands, made his career by fabricating data on the psychology of prejudice and racial stereotypes.
These incentives produce ever worse scientists because bad science succeeds. Scientists with bad procedures publish more, so they gain ever more reputation and funding. They then become eminent senior scientists who pass on their bad procedures to graduate students. Science’s natural selection evolves careless scientists tolerant of fraud, who seek out publication rather than the truth. The result is mass publication of underpowered small studies, guaranteed to contain large numbers of false positives.
Scientists know how to produce solid, reproducible science. It’s just that the scientific community’s incentives give them an overwhelming motivation not to bother.
Ritchie suggests a number of reforms to ameliorate the replication crisis. These include open data, pre-registered research protocols, computerized checks on statistical accuracy, guaranteed journal publication of replication research and negative results, abandonment (or at least reform) of the default standard of statistical significance—a host of highly technical changes. Ritchie’s guiding principle is that science needs to shift its incentives to encourage better research practices.
Specialists might argue about details of Ritchie’s narrative and analysis, as when he underplays the effect of liberal bias, but only to quibble. Ritchie lays out the dimensions of the replication crisis lucidly and entertainingly. He illustrates his arguments with apt examples. His notes will direct readers who wish to learn more about the replication crisis to an excellent range of professional literature. Science Fictions is an excellent introduction to the replication crisis.
David Randall is director of research at the National Association of Scholars.