We don’t always trust what we read and that can be a good thing, according to researchers across the country who have released findings on a large-scale effort to replicate research.
Under the guidance of the Center for Open Science at University of Virginia, nearly 100 previous studies were reproduced to take a closer look at the reliability of individual scientific findings.
With only about one-third of the studies being replicated, the work shines a light on how psychology and science in general, publicizes individual findings and emphasizes an on-going need for study replication.
New England Psychologist’s Catherine Robertson Souter spoke with Jesse Chandler, Ph.D, adjunct faculty associate at the Institute Social Research at the University of Michigan and a survey researcher at Mathematical Policy Research about the project.
Q: What drove the creation of this study?
A: The project started at a time when there were growing concerns about the incentive structures in the publication of scientific research and the extent to which the body of existing research was cluttered with false positives.
These concerns were amplified by widespread skepticism about a paper on precognition and rumors percolating about a high profile case of academic fraud.
The issue of outright academic fraud is incredibly rare and certainly none of the findings that we have in this particular paper pertain directly to that. But one of the things that made all these concerns difficult was that we had no real understanding of how replicable the field was in general.
We were trying to evaluate the extent to which a published finding can be taken and directly replicated. Typically, published research studies claim or imply that their findings represent a universally observable phenomenon that is robust enough to matter to psychological theory and in day-to-day life. Scientific knowledge is only useful if it is reproducible in some context beyond the point in which it is originally observed.
Q: How was the project structured?
A: The Center for Open Science, headed by Brian Nosek, Ph.D, professor of psychology at University of Virginia, provided project management. The conducting of studies was crowd sourced out across several hundred authors at different institutions. We picked three specific journals in one specific time frame. The sampling of the articles was not totally random since it was constrained by considerations like feasibility, but represents our good faith effort to avoid cherry picking specific studies.
Each replication effort was designed with direct contact and feedback from the original authors, in part because space constraints in journals make it difficult to convey all the information necessary to directly replicate a study.
To further increase the chances of observing a significant result, the studies were designed to have high statistical power to have a 90 percent chance of detecting an effect providing that the effect size reported in the initial study was accurate.
Q: The people who have been asked to help with the replication of their own work – have they been open to that?
A: By and large, most researchers provided materials and helpful commentary and feedback. In some cases, there were difficulties in obtaining original materials because the primary author had left the academy or the files were not locatable. This was a big lesson about the importance of archiving materials.
It is easy to say that the truth of published findings are totally separate from the people who observe them, but it is natural for people to feel invested in their ideas and to also feel like their individual reputation as a scientist is impacted by their replicability.
We tried to make sure that researchers did not feel singled out or attacked. My hope is that if an interest in replication grows, people will become more comfortable in dealing with these sorts of issues.
Q: So, what were the results?
A: Maybe the most intuitive way to answer this question is to ask how many replications produced a statistically significant result in the same direction as the original study. By this measure, about one-third of studies replicated.
I was a little disappointed because I thought that number would be closer to one-half or even higher. However, it is certainly not as low as the results observed in studies of reproducibility conducted in cell biology where they were only able to reproduce by 11 percent to 25 percent.
We found that the effect size and p-value of the original study were good predictors of replicability. We also found some weak evidence that the technical challenge of conducting a study makes it harder to replicate; suggesting that in some cases it might just be easier for either the original author or the replicator to get things wrong.
There are a lot of more general positive outcomes to this study beyond a first estimate of the replicability of results. The project has established a model that can be followed and it has raised awareness about issues related to replicability, and has, hopefully, removed sensitivity towards having the work directly replicated.
Q: For the general public, how do you present these findings without it being taken, as it has been in some headlines, as proof that research is not to be trusted?
A: In some senses, skepticism of novel findings is warranted but that does not mean that people should be skeptical of science as a process or skeptical of the consensus of aggravated findings over long period of time.
The findings that seem to fly in the face of our intuitions or current scientific consensus are the most newsworthy. But, studies with these features happen to be the ones in which we should have the least confidence.
I am thinking specifically of things like anti vaccine attitudes or climate change denial – there are always one or two studies that offer counter argument against hundreds of existing papers – and in those cases, consumers of scientific knowledge need to think of the proportion of studies that find one thing versus something else.
One unfortunate thing about scientific discourse is that the first person to get to a finding gets the high impact publication and then it becomes difficult to publish related findings afterwards. I think that this effort suggests that there is a value and a place for replication, particularly for important results with policy or clinical implications.
Q: What happens next with this project?
A: This project is more or less finished but all of the data is open access, which enables researchers to go back and draw their own conclusions. For example, a recent reanalysis of our data by Jeffrey Leek and colleagues suggested that close to 75 percent of the results we observed fell within the range of possible results predicted by the original studies.
This suggests that failure to replicate a study is not the last word on that particular phenomenon.
I am hopeful that these findings will inspire additional replication attempts and I hope that other disciplines will engage in similar efforts. Indeed this is already being done by the Center for Open Science for cancer biology. Aside from the obvious practical importance of these efforts for fields like medicine, clinical psychology and public policy, additional data from other fields may help us develop a better understanding of how and why effects do or don’t replicate.
By Catherine Robertson Souter