02/3/17

“Political” Science?

© Luis Molinero Martinez | 123rf.com

A 2014 study by a well known researcher from Columbia University indicated that “Sexual minorities living in communities with high levels of anti-gay prejudice experienced a higher hazard of mortality than those living in low-prejudice communities.” The press release for the study said it was the first study to look at the consequences of anti-gay prejudice for mortality. The study’s lead author, Mark Hatzenbuehler, said: “The results of this study suggest a broadening of the consequences of prejudice to include premature death.” The authors thought their study’s results highlighted the importance of examining structural forms of stigma and prejudice as social determinants of health and longevity among minority populations. A significant and potentially important finding—except it may not be true.

The original study, “Structural Stigma and All-Cause Mortality in Sexual Minority Populations” by Hatzenbuehler et al. was published in the February 2014 issue of Social Science & Medicine. Another researcher, Mark Regnerus, set out to replicate the Hatzenbuehler et al. study, but was not able to do so. Regenerus included a more refined imputation strategy in his replication, but still failed to find any significant results. “No data imputation approach yielded parameters that supported the original study’s conclusions.” Regenerus said:

Ten different approaches to multiple imputation of missing data yielded none in which the effect of structural stigma on the mortality of sexual minorities was statistically significant. Minimally, the original study’s structural stigma variable (and hence its key result) is so sensitive to subjective measurement decisions as to be rendered unreliable.

Writing for the National Review, Maggie Gallagher said that Regenerus’s failure to replicate the Hatzenbuehler et al. study amounted to a repudiation of that study. She also thought the study was faked. “When social justice displaces truth as the core value of academics, bad things happen to science.” She implied Hatzenbuehler might have slipped a bogus study into a major social-science journal, “confident that nobody would want to review and contest its findings, which so please the overwhelmingly liberal academy.”

Gallagher then referred to Mark Regenerus as an emerging scientific hero; a “modern-day Galileo standing up to the new theology of the Left.” But I think she misses the point. Both Hatzenbuehler and Regenerus are doing exactly what they are supposed to do in science: publishing their results and attempting to replicate the research of others. Henry Bauer, a professor of Chemistry & Scientific Studies at Virginia Polytechnic Institute and State University, describes how the “knowledge filter” in science can help uncover the real failures and confirm the true successes.

Bauer asks what would happen if most scientists rounded off or fudged their findings. What if they thought more about who wanted results and less about what an experiment actually showed? “To understand why science may be reliable or unreliable, you have to recognize that science is done by human beings, and that how they interact with one another is absolutely crucial.” He then went on to describe how frontier science leads to publication in the primary literature.

If those [findings] seem interesting enough to others, they’ll be used and thereby tested and perhaps modified or extended – or found to be untrue. Whatever survives as useful knowledge gets cited in other articles and eventually in review articles and monographs, the secondary literature, which is considerably more consensual and reliable than the primary literature.

Regenerus’s findings themselves have to be replicated; by more than one additional study before Gallagher’s assessment that Regenerus repudiated Hatzenbuehler et al. is confirmed. Concluding the study was faked or bogus based just upon his findings is irresponsible and goes beyond what Regenerus himself said.

Regenerus said the findings of the Hatzenbuehler et al. study seemed to be very sensitive to subjective decisions made about the imputation of missing data, “decisions to which readers are not privy.” He also thought the structural stigma variable itself was questionable, “Hence the original study’s claims that such stigma stably accounts for 12 years of diminished life span among sexual minorities seems unfounded, since it is entirely mitigated in multiple attempts to replicate the imputed stigma variable.” He thought his study highlighted the importance of cooperation and transparency in science.

The unavailability of the original study’s syntax and the insufficient description of multiple imputation procedures leave unclear the reasons for the failed replication. It does, however, suggest that the results are far more contingent and tenuous than the original authors conveyed. This should not be read as a commentary on missing data or on the broader field of the study of social stigma on physical and emotional health outcomes, but rather as a call to greater transparency in science (Ioannidis, 2005). While the original study is not unique in its lack of details about multiple imputation procedures, future efforts ought to include supplementary material (online) enabling scholars elsewhere to evaluate and replicate studies’ central findings (Rezvan et al., 2015). This would enhance the educational content of studies as well as improve disciplinary rigor across research domains.

Regenerus is not a scientific hero and Hatzenbuehler is not a research villain. But two other individuals identified by Gallagher in her article may fit within those categories.

Michael LaCour co-authored a paper along with Donald Green that was published in the prestigious journal Science in December of 2014. The original article abstract said: “LaCour and Green demonstrate that simply a 20-minute conversation with a gay canvasser produced a large and sustained shift in attitudes toward same-sex marriage for Los Angeles County residents.” Green is a highly respected political science professor now at Columbia. LaCour was a political science grad student at UCLA.

Back in September of 2013, Michael LaCour met with David Broockman at the annual meeting of the American Political Science Association and showed him some of the early results of his study. Writing for NYMag.com, Jesse Singal noted how Broockman was “blown away” by some of the results LaCour shared with him. LaCour also told him he was looking to get Donald Green as a coauthor on the paper. Coincidentally, Green happened to be Broockman’s undergraduate advisor when they were both at Yale.

Singal pointed out that LaCour’s results were so noteworthy because they contradicted every established belief about political persuasion. “The sheer magnitude of effect LaCour had found in his study simply doesn’t happen — piles of previous research had shown that.” In early 2015, Broockman decided to replicate LaCour’s findings. The first clue there was something wrong was when he realized the estimated cost for a replication would be a cool million dollars. Where would a grad student like LaCour get the money or funding for a study like that? That first anomaly eventually led to: “Irregularities in LaCour (2014),” a 27 page report he coauthored with Josh Kalla and Yale University political scientist, Peter Arnow.

“Irregularities” is diplomatic phrasing; what the trio found was that there’s no evidence LaCour ever actually collaborated with uSamp, the survey firm he claimed to have worked with to produce his data, and that he most likely didn’t commission any surveys whatsoever. Instead, he took a preexisting dataset, pawned it off as his own, and faked the persuasion “effects” of the canvassing. It’s the sort of brazen data fraud you just don’t see that often, especially in a journal like Science.

Green quickly emailed the journal and asked for a retraction, which he received. When contacted about comments that he had failed in his supervisory role for the study, Green said that assessment was entirely fair: “I am deeply embarrassed that I did not suspect and discover the fabrication of the survey data and grateful to the team of researchers who brought it to my attention.”

LaCour had a job offer as an incoming assistant professor at Princeton rescinded. He also reportedly lied about several items on his curriculum vitae, including grants and a teaching award. You can review a post mortem of the LaCour controversy by Neuroskeptic for Discover Magazine here. Neuroskeptic thought LaCour’s objections to Broockman et al. were weak. He also thought Lacour’s objections to the findings of Broockman et al. failed to refute their central criticism.

Cases of seeming scientific fraud, like that of LaCour, draw attention to themselves when they are discovered. Writing for STAT News, Ivan Orlansky and Adam Marcus described a survey by researchers in the Netherlands of working scientists. They were asked to score 60 research misbehaviors according to their impressions of how often the misbehaviors occur, their preventability, the impact on truth (validity), and the impact of trust between scientists.  The respondents were more concerned with sloppy science than scientific fraud. Fraud, when it occurred, has a significant impact on truth and public trust. But those cases are rare; and detected cases are even rarer. They concluded:

Our ranking results seem to suggest that selective reporting, selective citing, and flaws in quality assurance and mentoring are the major evils of modern research. A picture emerges not of concern about wholesale fraud but of profound concerns that many scientists may be cutting corners and engage in sloppy science, possibly with a view to get more positive and more spectacular results that will be easier to publish in a high-impact journal and will attract many citations. In the fostering of responsible conduct of research, we recommend to develop interventions that actively discourage the high-ranking misbehaviors from our study.

So it would seem that problems with the Hatzenbuehler et al. study are not fraud, but could be due to smaller more pervasive issues in its research, such as a shoddy methodology. The LaCour case catches more attention and generates mistrust because of its apparent fraud. Orlansky and Marcus are right. Although not as flashy as fraudulent research, the smaller, less outrageous research sins are a greater threat to scientific credibility. Gallagher may have let her own ideology influence how she emphasized these two cases, but she was unquestionably right in her concluding remarks:  “Science is not right-wing or left-wing. But to work, it needs scientists fearlessly committed to truth over their preferred outcomes.”

01/14/15

The Reproducibility Problem

Copyright : Fernando Gregory (Follow)
Copyright : Fernando Gregory (Follow)

In January of 2014, a Japanese stem cell scientist published what looked like groundbreaking research in the journal Nature that suggested stem cells could be made quickly and easily. But as James Gallagher of the BBC noted, “the findings were too good to be true.” Her work was investigated by the center where she conducted her research amid concern within the scientific community that the results had been fabricated. In July, the Riken Institute wrote a retraction of the original article, noting the presence of “multiple errors.” The scientist was later found guilty of misconduct. In December of 2014, Riken announced that their attempts to reproduce the results had failed. Dr. Obokata resigned saying, “I even can’t find the words for an apology.”

The ability to repeat or replicate someone’s research is the way scientists can weed out nonsense, stupidity and pseudo-science from legitimate science. In Scientific Literacy and the Scientific Method, Henry Bauer described a ‘knowledge filter’ that illustrated this process. The first stage of this process was research or frontier science. The research is then presented to editors and referees of scientific journals for review, in hopes of being published. It may also be presented to other interested parties in seminars or at conferences. If the research successfully passes through this first filter, it will be published in the primary literature of the respective scientific field—and pass into the second stage of the scientific knowledge filter.

The second filter consists of others trying to replicate the initial research or apply some modification or extension of the original research. This is where the reproducibility problem occurs. The majority of these replications fail. But if the results of the initial research can be replicated, these results are also published as review articles or monographs (the third stage). After being successfully replicated, the original research is seen as “mostly reliable,” according to Bauer.

So while the stem cell research of Dr. Obokata made it through the first filter to the second stage, it seems that it shouldn’t have. The implication is that Nature didn’t do a very good job reviewing the data submitted to it for publication. However, when the second filtering process began, it detected the errors that should have been caught by the first filter and kept what was poor science from being accepted as reliable science.

A third filter occurs where the concordance of the research results with other fields of science is explored. There is also continued research by others who again confirm, modify and extend the original findings. When the original research successfully comes through this filter, it is “mostly very reliable,” and will get included into scientific textbooks.

Francis Collins and Lawrence Tabak of The National Institute of Health (NIH) commented that: “Science has long been regarded as ‘self-correcting’, given that it is founded on the replication of earlier work.” But they noted how the checks and balances built into the process of doing science—that once helped to ensure its trustworthiness—have been compromised. This has led to the inability of researchers to reproduce the initial research findings.  Think here of how Obokata’s stem cell research was approved for publication in Nature, one of the most prestigious science journals.

The reproducibility problem has become a serious concern within research conducted into psychiatric disorders. Thomas Insel, the Director of the National Institute of Mental Health (NIMH), wrote a November 14, 2014 article in his blog on the “reproducibility problem” in scientific publications. He said that “as much as 80 percent of the science from academic labs, even science published in the best journals, cannot be replicated.” Insel said this failure was not always because of fraud or the fabrication of results. Perhaps his comment was made with the above discussion of Dr. Obokata’s research in mind. Then again, maybe it was made in regard to the following study.

On September 16, 2014, the journal Translational Psychiatry published a study done at Northwestern University that claimed it was the “First Blood Test Able to Diagnose Depression in Adults.”  Eva Redei, the co-author of the study said: “This clearly indicates that you can have a blood-based laboratory test for depression, providing a scientific diagnosis in the same way someone is diagnosed with high blood pressure or high cholesterol.” A surprise finding of the study was that the blood test also predicted who would benefit from cognitive behavioral therapy. The study was supported by grants from the NIMH and the NIH.

The Redei et al. study received a good bit of positive attention in the news media.  It was even called a “game changing” test for depression. WebMD, Newsweek, Huffington Post, US News and World Report, Time and others published articles on the research—all on the Translational Psychiatry publication date of September 16th.  Then James Coyne, PhD published a critique of the press coverage and the study in his “Quick Thoughts” blog. Coyne systematically critiqued the claims of the Redei et al. study. Responding to Dr. Rediei’s quote in the above paragraph, he said: “Maybe someday we will have a blood-based laboratory test for depression, but by themselves, these data do not increase the probability.”

He wondered why these mental health professionals would make such “misleading, premature, and potentially harmful claims.” In part, he thought it was because it was fashionable and newsworthy to claim progress in an objective blood test for depression. “Indeed, Thomas Insel, the director of NIMH is now insisting that even grant applications for psychotherapy research include examining potential biomarkers.” Coyne ended with quotes that indicated that Redei et al. were hoping to monetize their blood test. In an article for Genomeweb.com, Coyne quoted them as saying: “Now, the group is looking to develop this test into a commercial product, and seeking investment and partners.”

Coyne then posted a more thorough critique of the study, which he said would allow readers to “learn to critically examine the credibility of such claims that will inevitably arise in the future.” He noted how the small sample size contributed to its strong results—which are unlikely to be replicated in other samples. He also cited much larger studies looking for biomarkers for depression that failed to find evidence for them. His critique of the Redie et al. study was devastating. The comments from others seemed to agree. But how could these researchers be so blind?

Redie et al. apparently believed unquestionably that there is a biological cause for depression. As a result, their commitment to this belief effected how they did their research to the extent that they were blind to the problems pointed to by Coyne. Listen to the video embedded in the link “First Blood Test Able to Diagnose Depression in Adults” to hear Dr. Redie acknowledge she believes that depression is a disease like any other disease. Otherwise, why attempt to find a blood test for depression?

Attempts to replicate the Redei et al. study, if they are done, will raise further questions and (probably) refute what Coyne said was a study with a “modest sample size and voodoo statistics.” Before we go chasing down another dead end in the labyrinth of failed efforts to find a biochemical cause for depression, let’s stop and be clear about whether this “game changer” is really what it claims to be.