02/11/15

Open Access Could ‘KO’ Publication Bias

© : Antonio Abrignani 123rf.com

© : Antonio Abrignani 123rf.com

A crisis of sorts has been brewing in academic research circles. Daniele Fanelli found that the odds of reporting a positive result were 5 times higher among published papers in Psychology and Psychiatry than in Space Science. “Space Science had the lowest percentage of positive results (70.2%) and Psychology and Psychiatry the highest (91.5%).”

Three Austrian researchers, Kühberger et al., randomly sampled 1,000 published articles from all areas of psychological research. They calculated p values, effect sizes and sample sizes for all the empirical papers and investigated the distribution of p values. They found a negative correlation between effect size and sample size. There was also “an inordinately high number of p values just passing the boundary of significance.” This pattern could not be explained by implicit or explicit power analysis. “The negative correlation between effect size and samples size, and the biased distribution of p values indicate pervasive publication bias in the entire field of psychology.” According to Kühberger et al., publication bias was present if there was a better chance of publication if there were significant results in the analysis.

Publication bias can occur at any stage of the publication process where a decision is made: in the researcher’s decision to write up a manuscript; in the decision to submit the manuscript to a journal; in the decision of journal editors to send a paper out for review; in the reviewer’s recommendations of acceptance or rejection, and in the final decision whether to accept the paper. Anticipation of publication bias may make researchers conduct studies and analyze results in ways that increase the probability of getting a significant result, and to minimize the danger of non-significant results.

Charles Seife, in his December 2011 talk for Authors@Google, said it succintly: “Most published research findings are wrong.” Seife was speaking on the topic of his book, Proofiness: The Dark Arts of Mathematical Deception.”  But Seife is not alone in this opinion, as several others have made the same claim. In 2005, John Ioannidis published “Why Most Published Research Findings Are False” in PLOS Medicine. He said that for many scientific fields, claimed research findings may often be just measures of the prevailing bias of the field. “For most study designs and settings, it is more likely for a research claim to be false than true.” Uri Simonsohn has been called “The Data Detective” for his efforts in identifying and exposing cases of wrongdoing in psychology research.

These and other “misrepresentations” received a lot of attention at the NIH, with a series of meetings to discuss the nature of the problem and the best solutions to address it. Both the NIH (NIH guidelines) and the NIMH (NIMH guidelines) published principles and guidelines for reporting research. The NIH guidelines were developed with input from journal editors from over 30 basic/preclinical journals in which NIH-funded investigators have most often published their results. Thomas Insel, the director of MIMH, noted how the guidelines aimed “to improve the rigor of experimental design, with the intention of improving the odds that results” could be replicated.

Insel thought it was easy to misunderstand the so-called “reproducibility problem”  (see “The Reproducibility Problem“). Acknowledging that science is not immune to fraudulent behavior, he said the vast majority of the time the reproducibility problem could be explained by other factors—which don’t involve intentional misrepresentation or fraud. He indicated that the new NIH guidelines were intended to address the problems with flawed experimental design. Insel guessed that misuse of statistics (think intentional fraud, as in “The Data Detective”) was only a small part of the problem. Nevertheless, flawed analysis (like p-hacking) needed more attention.

An important step towards fixing it is transparent and complete reporting of methods and data analysis, including any data collection or analysis that diverged from what was planned. One could also argue that this is a call to improve the teaching of experimental design and statistics for the next generation of researchers.

From reading several articles and critiques on this issue, my impression is that Insel may be minimizing the problem (see “How to Lie About Research”). Let’s return to Ioannidis, who said that several methodologists have pointed out that the high rates of nonreplication of research discoveries were a consequence of “claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically a p-value less than 0.05.” As Charles Seife commented: “Probabilities are only meaningful when placed in the proper context.”

Steven Goodman noted that p-values were widely used as a measure of statistical evidence in medical research papers. Yet they are extraordinarily difficult to interpret. “As a result, the P value’s inferential meaning is widely and often wildly misconstrued, a fact that has been pointed out in innumerable papers and books appearing since at least the, 1940s.” Goodman then reviewed twelve common misconceptions with p-values. He also pointed out the possible consequences of these misunderstandings or misrepresentations of p-values. See Goodman’s article for a discussion of the problems.

After his examination of the key factors contributing the inaccuracy of published research findings, Ioannidis suggested there were several corollaries that followed. Among them were: 1) The smaller the studies conducted, the less likely the research findings will be true; 2) the smaller the effect sizes, the less likely the research will be true; 3) the greater the financial and other interests and prejudices in a scientific field, the less likely the research findings will be true; and 4) the hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true. The first two could fit within Insel’s issue of flawed experimental design, but not the third and fourth corollaries.

Moonesinghe et al. noted that while they agreed with Ioannidis that most research is false, they were able to show that “replication of research findings enhances the positive predictive value of research findings being true.” However, their analysis did not consider the possibility of bias in the research. They commented how Ioannidis showed that even a modest bias could decrease the positive predictive value of research dramatically. “Therefore if replication is to work in genuinely increasing the PPV of research claims, it should be coupled with full transparency and non-selective reporting of research results.”

While not everyone is supportive of the idea, open access within peer-reviewed scholarly research would go a long way to correcting many of these problems. Starting in January of 2017, the Bill & Melinda Gates Foundation will require all of its research to be published in an open access manner. Susannah Locke on Vox cited a chart from a 2012 UNESCO report that showed where scholarly publications in clinical medicine and biomedicine have typically been less available for open access than other scientific fields. Access to psychology research began a downward trend in 2003 and was no better than the average for all fields by 2006.

There is a growing movement to widen Open Access (OA) to peer-reviewed scholarly research. The Budapest Open Access statement said by ‘open access’ they meant “its free availability on the internet, permitting any user to read, download, copy, distribute, print, search, or link to the full texts of these articles.” Francis Collins, in an embedded video on the Wikipedia page for “Open Access” noted the NIH’s support for open access. Effective on May 8, 2013, President Obama signed an Executive Order to make government-held data more accessible to the public.

Many of the concerns with academic research discussed here could be quickly and effectively dealt with through open access. The discussion of the “First Blood Test Able to Diagnose Depression in Adults,” looked at in “The Reproduciblity Problem,” is an example of the benefit and power it brings to the scientific process.  There is a better future for academic research through open access. It may even knock publication bias out of the academic journals.

01/28/15

How to Lie About Research

Copyright : agencyby

Credit: 123RF; copyright : agencyby

According to Charles Seife, “A well-wrapped statistic is better than Hitler’s ‘big lie’; it misleads, yet it cannot be pinned on you.” Twenty-some years ago I bought and read Darrell Huff’s little gem of a book: How to Lie with Statistics. And it seems I wasn’t the only one, particularly when I read about some of the problems with medical science and psychology research. Huff said that while his book might appear to be a primer in the ways to use statistics to deceive others, honest people must also learn them in self-defense.

Thesaurus.com gave 48 synonyms for the verb form of “lie,” including: deceive, mislead, misrepresent, exaggerate, fabricate, misstate, fudge and BS. One or more of these synonyms will be found regularly in the discussion (and linked articles) that follow. But make no mistake—the discussion is still about how the public can be lied to in what they read health science news.  Along with a previous article, “The Reproducibility Problem,” this is meant in inform. So let’s look at some of the ways that we are lied to about psychology and medical science research news.

Gary Schwitzer wrote about the problem of exaggeration in health science news releases. He commended an editorial by Ben Goldacre and a research paper by Sumner et al. published in the BMJ, a peer reviewed medical journal, on exaggerations in academic press releases and the news reporting they generate. Sumner et al. found that most of the exaggerations identified in their study did not occur ‘de novo’ in the media reports, but were “already present in the text of the press releases produced by academics and their establishments.” And when press releases contained misleading statements, it was likely that the news would be as well. “Exaggeration in news is strongly related with exaggeration in press releases.”

The study’s three main outcome measures of exaggeration were: whether causal statements were made about correlational research; if there was advice to readers about behavior changes; and were there inferences made to humans from animal studies that went beyond those already accepted in the literature. The authors concluded:

Our findings may seem like bad news but we prefer to view them positively: if the majority of exaggeration occurs within academic establishments, then the academic community has the opportunity to make an important difference to the quality of biomedical and health related news.

Goldacre noted that while some fixes to the problem were in place, they were routinely ignored. He further suggested that press releases should be treated as part of the scientific publication and then subjected to the same accountability, feedback and transparency of the published research. “Collectively this would produce an information trail and accountability among peers and the public.” Schwitzer noted to how the academic community had the opportunity to “make an important difference to the quality of biomedical and health related news.”

A review of 2,047 biomedical and life-science research articles by Fang, Steen and Casadevall indicated that only 21.3% of retractions could be attributed to error. A whopping 67.4% were attributable to misconduct, including fraud or suspected fraud (43.4%). The percentage of scientific articles retracted since 1975 for fraud has risen 10-fold. “We further note that not all articles suspected of fraud have been retracted.”

But these weren’t the only problems with the current academic research and publication process. A November 2014 article in Nature described a peer-review scam, where journals were forced to retract 110 papers involved in at least 6 instances of peer-review rigging. “What all these cases had in common was that researchers exploited vulnerabilities in the publishers’ computerized systems to dupe editors into accepting manuscripts, often by doing their own reviews.” Recommendations in the article were also made for changing the way editors assign reviewers, particularly the use of reviewers suggested by a manuscript’s author. Cases of authors suggesting friends and even themselves—using a maiden name—were noted.

A further concern is with p-hacking, also known as: data-dredging, fishing, and others. Uri Simonsohn and Joe Simmons, who jointly coined the term, said p-hacking “is trying multiple things until you get the desired result.” They said p-hacking was particularly likely in “today’s environment of studies that chase small effects hidden noisy data.” Their simulations have shown that changing a few data-analysis decisions can increase the rate of false-positives to 60%. Confirming how widespread this is would be difficult. But they found evidence that “many published psychology papers report P values that cluster suspiciously around 0.05, just as would be expected in researchers fished for significant P values until they found one.”

According to Charles Seife, a journalist and author with degrees in mathematics from Princeton and Yale, a “p-value” is a rough measure of how likely your observation could be a statistical fluke. The lower the p-value, the more confident you are that your observation isn’t a fluke. Generally, statistical significance is a p<0.05. The YouTube video of Seife’s talk is about 45 minutes, with another twenty some minutes of question and answer. It gives an understandable presentation of how statistics, including p-hacking, can be misused.

Simonsohn and Simmons devised a method consisting of three simple pieces of information that scientists should include in an academic paper to indicate their data was not p-hacked. Whimsically, they suggested these three rules could be remembered as a song (they need to work on their musical composition skills). First, they preach to the choir. You’ll recognize the “melody” they used when you read the lyrics:

Choir: There is no need to wait for everyone to catch-up with your desire for a more transparent science. If you did not p-hack a finding, say it, and your results will be evaluated with the greater confidence they deserve.

If you aren’t p-hacking and you know it, clap your hands.

If you determined sample size in advance,say it.

If you did not drop any variables,say it.

If you did not drop any conditions,say it.

A mere 21 words, included in the Methods section of a paper would declare the above: “We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.”

Scientific theories are in principle subject to revision. And sometimes people’s desires drive them to find explanations that harmonize with their desires and with a worldview that reinforces those desires. (Vern Poythress, Redeeming Science)