Good science gone wrong?
Most scientists want to tell the truth. We want to help people by answering important questions, and sharing what we learn. But the research endeavor is big and messy. And as we’ve learned from the climate change and HIV/AIDS debates, there will always be folks who favor controversy, dogma, and press coverage over scientific consensus.
Sadly, last week we saw a major step backward in global health, with the launch of a media frenzy around children’s deworming.
The blogosphere is buzzing about a group of epidemiologists at the London School of Hygiene and Tropical Medicine, led by Alex Aiken, that carried out the verification and reanalysis of a landmark 2004 trial of school-based deworming.
The original study, by Michael Kremer and Ted Miguel, measured school participation across two groups of children in Western Kenya: those who received deworming treatments, and those who did not (although the control group received the pills a few years later). The study was unique because Miguel and Kremer took care to measure the spillover effects of deworming: treating some kids can interrupt transmission for nearby children who are not treated, which provides an important benefit.
The punchline from the original study was profound: kids who received deworming were absent from school 25% less than those in the control group. Essentially, healthier kids were spending more time in the classroom. A long term follow-up with the same study population suggests that, on average, kids who were dewormed are earning 20% more as adults.
Parallel studies find similar results, including an evaluation of mass deworming in Ugandaand a retrospective evaluation of a deworming campaign in the Southern United States, implemented in the early 1900s by Rockefeller.
There is a consensus among many academic researchers, the World Health Organization, and other technical experts that children’s deworming campaigns represent good value for money, particularly in communities with a heavy burden of worm infections. As a result, more than 100 million Kenyan and Indian kids are now receiving deworming medicines, with support from their governments.
But the re-analysis by the London School suggests a departure from this consensus– and indeed a departure from standard statistical methods. The team does verify that deworming increases school attendance in a “pure” replication of the study, but they also carry out a statistical re-analysis (Davey et al) that introduces several unconventional judgments into the mix.
They show that you can eliminate the impact of deworming on school attendance — if you torture the data. In summary, they:
- redefine the way the treatment was assigned in the study in a way that is not consistent with the protocols that were used to assign treatment,
- massage the measures used to estimate school absenteeism (i.e., use non-standard and controversial reweightings of the data), and
- break the sample into sub-groups, which dramatically reduces statistical power
This fishing exercise was “successful” in getting results that eliminated the effects. One wonders how many other unreported analyses were done until the authors were able to find this particular set of approaches.
Most of my colleagues in the academic community would reject this sort of data mining. I’ve reviewed the replications myself, and I have major concerns with their methodological choices. And this isn’t about a difference between public health and economics. I sit in a public health school and in an economics department, and my colleagues on both sides of campus use the same statistics. We both run randomized, controlled trials. And while we may use different jargon, we share the same methods.
I’m not the only researcher questioning the choices made by Aiken and Davey (see posts fromChris Blattman, Berk Ozler, Michael Clemens, and Alexander Berger).
And I do think that replications are an essential part of science. Existing data should be reanalyzed by independent, outside researchers. This is how we uncover truth. The recent case of Green and Lacours is a great example of this: two graduate students discovered inconsistencies in a study of voter behavior in California, which prompted retraction of the flawed publication.
But here’s the rub: what if the replication, itself, is widely seen as flawed? I think that replications should be held to the same standards as any other study.
In an ideal world, journalists would catch this subtlety. But science requires a great deal of judgment, and one researcher’s judgment might not stand up to the broader community’s scrutiny. This is why an independent peer review process is so important. It makes me wonder why Aiken et al decided to publish their work in the International Journal of Epidemiology. The editor-in-chief of the IJE is a colleague in the same department as the replication authors. While there is no evidence that the Journal loosened its standards for this paper, it does make one wonder. Frankly, for an issue as sensitive as children’s deworming, the optics would be better had the authors chosen a more independent venue for publishing their work.
If the media’s process of self-correction works, journalists will start reporting some of the more critical views of this reanalysis. But so far, they have narrowly focused on the replication’s findings, without giving a voice to the other academics (including the original study authors) who are raising concerns.
So as of last week, journalists and bloggers were reporting that deworming no longer holds up. They are claiming that it has been “debunked” as a cost-effective way to improve children’s school participation. Yet these reporters never contacted the original study authors, nor did they look at other studies that found similar effects in other settings. This is irresponsible, because government policy is highly vulnerable to claims made by the media.
In the early 2000s, I was a policymaker at the World Bank – chief economist for human development. My job involved reviewing the evidence and deciding how to apply it. We funded programs like mass deworming based on a large amount of evidence from multiple studies. But we would never consider overturning an evidence-based program simply because of one bad replication.
For full disclosure, I should also mention that I am at the same university but not the same department as one of the original 2004 study authors, and I chaired the board of the International Initiative for Impact Evaluation (3ie), the organization that paid the researchers to conduct this replication. I am no longer involved with 3ie, and I was not part of the decision to fund the study. However, I have been involved in other replications funded by 3ie, and the experience has not been positive. As an academic community, we need to come together and set common standards for replication. This is not something we should outsource to researchers for hire.
A final note, on transparency: some articles suggest that data and code from the 2004 study were just recently released — but in fact they were made available more than 8 years ago (in January 2007) and have been shared with dozens of scholars. Back in 2007, Miguel and Kremer did acknowledge a series of minor rounding errors and a coding error in the original work, which caused them to overestimate the impacts of deworming on nearby untreated children.
But they took the time to correct their errors, and they published an updated set of results online. They also actively disseminated the new results to key stakeholders. This makes the story a bit messier, but that’s the process of discovery. It’s never linear.
It would be great if every research group (and journalist!) showed such a commitment to openness and scholarly values. Without this commitment, we will probably keep rolling back our progress in science — rather than advancing the frontiers.
—
Original article here