02/11/15

Open Access Could ‘KO’ Publication Bias

© : Antonio Abrignani 123rf.com

© : Antonio Abrignani 123rf.com

A crisis of sorts has been brewing in academic research circles. Daniele Fanelli found that the odds of reporting a positive result were 5 times higher among published papers in Psychology and Psychiatry than in Space Science. “Space Science had the lowest percentage of positive results (70.2%) and Psychology and Psychiatry the highest (91.5%).”

Three Austrian researchers, Kühberger et al., randomly sampled 1,000 published articles from all areas of psychological research. They calculated p values, effect sizes and sample sizes for all the empirical papers and investigated the distribution of p values. They found a negative correlation between effect size and sample size. There was also “an inordinately high number of p values just passing the boundary of significance.” This pattern could not be explained by implicit or explicit power analysis. “The negative correlation between effect size and samples size, and the biased distribution of p values indicate pervasive publication bias in the entire field of psychology.” According to Kühberger et al., publication bias was present if there was a better chance of publication if there were significant results in the analysis.

Publication bias can occur at any stage of the publication process where a decision is made: in the researcher’s decision to write up a manuscript; in the decision to submit the manuscript to a journal; in the decision of journal editors to send a paper out for review; in the reviewer’s recommendations of acceptance or rejection, and in the final decision whether to accept the paper. Anticipation of publication bias may make researchers conduct studies and analyze results in ways that increase the probability of getting a significant result, and to minimize the danger of non-significant results.

Charles Seife, in his December 2011 talk for Authors@Google, said it succintly: “Most published research findings are wrong.” Seife was speaking on the topic of his book, Proofiness: The Dark Arts of Mathematical Deception.”  But Seife is not alone in this opinion, as several others have made the same claim. In 2005, John Ioannidis published “Why Most Published Research Findings Are False” in PLOS Medicine. He said that for many scientific fields, claimed research findings may often be just measures of the prevailing bias of the field. “For most study designs and settings, it is more likely for a research claim to be false than true.” Uri Simonsohn has been called “The Data Detective” for his efforts in identifying and exposing cases of wrongdoing in psychology research.

These and other “misrepresentations” received a lot of attention at the NIH, with a series of meetings to discuss the nature of the problem and the best solutions to address it. Both the NIH (NIH guidelines) and the NIMH (NIMH guidelines) published principles and guidelines for reporting research. The NIH guidelines were developed with input from journal editors from over 30 basic/preclinical journals in which NIH-funded investigators have most often published their results. Thomas Insel, the director of MIMH, noted how the guidelines aimed “to improve the rigor of experimental design, with the intention of improving the odds that results” could be replicated.

Insel thought it was easy to misunderstand the so-called “reproducibility problem”  (see “The Reproducibility Problem“). Acknowledging that science is not immune to fraudulent behavior, he said the vast majority of the time the reproducibility problem could be explained by other factors—which don’t involve intentional misrepresentation or fraud. He indicated that the new NIH guidelines were intended to address the problems with flawed experimental design. Insel guessed that misuse of statistics (think intentional fraud, as in “The Data Detective”) was only a small part of the problem. Nevertheless, flawed analysis (like p-hacking) needed more attention.

An important step towards fixing it is transparent and complete reporting of methods and data analysis, including any data collection or analysis that diverged from what was planned. One could also argue that this is a call to improve the teaching of experimental design and statistics for the next generation of researchers.

From reading several articles and critiques on this issue, my impression is that Insel may be minimizing the problem (see “How to Lie About Research”). Let’s return to Ioannidis, who said that several methodologists have pointed out that the high rates of nonreplication of research discoveries were a consequence of “claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically a p-value less than 0.05.” As Charles Seife commented: “Probabilities are only meaningful when placed in the proper context.”

Steven Goodman noted that p-values were widely used as a measure of statistical evidence in medical research papers. Yet they are extraordinarily difficult to interpret. “As a result, the P value’s inferential meaning is widely and often wildly misconstrued, a fact that has been pointed out in innumerable papers and books appearing since at least the, 1940s.” Goodman then reviewed twelve common misconceptions with p-values. He also pointed out the possible consequences of these misunderstandings or misrepresentations of p-values. See Goodman’s article for a discussion of the problems.

After his examination of the key factors contributing the inaccuracy of published research findings, Ioannidis suggested there were several corollaries that followed. Among them were: 1) The smaller the studies conducted, the less likely the research findings will be true; 2) the smaller the effect sizes, the less likely the research will be true; 3) the greater the financial and other interests and prejudices in a scientific field, the less likely the research findings will be true; and 4) the hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true. The first two could fit within Insel’s issue of flawed experimental design, but not the third and fourth corollaries.

Moonesinghe et al. noted that while they agreed with Ioannidis that most research is false, they were able to show that “replication of research findings enhances the positive predictive value of research findings being true.” However, their analysis did not consider the possibility of bias in the research. They commented how Ioannidis showed that even a modest bias could decrease the positive predictive value of research dramatically. “Therefore if replication is to work in genuinely increasing the PPV of research claims, it should be coupled with full transparency and non-selective reporting of research results.”

While not everyone is supportive of the idea, open access within peer-reviewed scholarly research would go a long way to correcting many of these problems. Starting in January of 2017, the Bill & Melinda Gates Foundation will require all of its research to be published in an open access manner. Susannah Locke on Vox cited a chart from a 2012 UNESCO report that showed where scholarly publications in clinical medicine and biomedicine have typically been less available for open access than other scientific fields. Access to psychology research began a downward trend in 2003 and was no better than the average for all fields by 2006.

There is a growing movement to widen Open Access (OA) to peer-reviewed scholarly research. The Budapest Open Access statement said by ‘open access’ they meant “its free availability on the internet, permitting any user to read, download, copy, distribute, print, search, or link to the full texts of these articles.” Francis Collins, in an embedded video on the Wikipedia page for “Open Access” noted the NIH’s support for open access. Effective on May 8, 2013, President Obama signed an Executive Order to make government-held data more accessible to the public.

Many of the concerns with academic research discussed here could be quickly and effectively dealt with through open access. The discussion of the “First Blood Test Able to Diagnose Depression in Adults,” looked at in “The Reproduciblity Problem,” is an example of the benefit and power it brings to the scientific process.  There is a better future for academic research through open access. It may even knock publication bias out of the academic journals.