Data reproducibility is getting a lot of attention in the scientific community, and NIH is among those seeking to address the issue . At NIGMS, one area we’re focusing on is the needs and opportunities for training in areas relevant to improving data reproducibility in biomedical research. We just issued a request for information to gather input on activities that already exist or are planned as well as on crucial needs that an NIGMS program should address.
I strongly encourage you and your colleagues to submit comments by the February 28 deadline (no longer available). The information will assist us in shaping a program of small grants to support the design and development of exportable training modules tailored for graduate students, postdoctoral students and beginning investigators.
UPDATE: NIGMS and additional NIH components have issued the Training Modules to Enhance Data Reproducibility (R25) funding opportunity announcement. The application deadline is November 20.
My perspective is as a director of a pre-doctoral training grant program, being a pre- and post-doc mentor, and as chair of 2 departments that have young faculty.
The grants should be designed to generate modules that could help trainees answer discrete questions about the value and veracity of their data and data analysis
1. Is my data worthy of analysis? Issues here are experimental design (internal/external controls, bias in design, complex interactions of drugs, etc), basic data validity (images or blots off scale, ethics of retaining or discarding data based on unbiased criteria), reviewing data with a mentor and keeping records, approaching a question in two independent ways, etc.
2. Is my data analysis right? Types of data (discrete, continuous), statistical analysis, image analysis ethics/practices, extra challenges of big data sets (this last item is the one that needs deep attention in the community and could be its own module)
3. How do I publish my data? Perils of supplemental data, responsibility for an adequate methods section, publication ethics (plagiarism, data not shown, etc), who owns the data and data records, etc.
I post as anonymous primarily because I know there is a great deal of anxiety among my colleagues that is related to the topic of reproducibility. Although the comment by Dr. Montrose is useful, and it echoes the line of thinking at the NIH, it falls significantly short of addressing the real challenge. Borrowing the language of some reporting on the topic, the key issue in reproducibility boils down to “magic sauce” – the stuff that made it possible to get from the raw data to the “exciting” results. In most cases of irreproducible results, when one examines the magic sauce carefully, one of the following scenarios emerge: a) magic sauce is poorly specified, b) magic sauce has magic ingredients, c) magic sauce needs a magic chef, or d) there is no magic sauce!
NIH, and the comment by Dr. Montrose seem to be primarily focused on case a). In case b), the magic ingredient could be missing data, missing assumptions, missing annotations, or other information of a similar nature. This is a situation that can clearly be remedied through data management and data curation. The case of magic sauce needing a magic chef is a bit harder. Case b) has added complications in cases where a large volume of data is involved, and the examination of algorithms and assumptions is cumbersome. Yet, even in these cases, should NIH require reproducibility, there are technological solutions that address 90% of the problem. Case d) is easy to address after cases a-c are addressed.
NIH, and the comment by Dr. Montrose seem to be primarily focused on a) – which, if remedied, will have little impact on b,c, and d. And, the scientists I talk to seem to be more anxious about cases b, and c. The logic being that case b and c will have its largest impact on “big science” and that is where a lot of NIH money is and that is where achieving full rigor is often missing – either missing in action, or missing in print. Yet, this logic should be the reason why NIH should turn serious focus to this arena. And, this should happen before politics at the level of Capitol Hill turns its focus to why so much of big science cannot be reproduced!