I recently had the opportunity to talk to Phil Bourne, NIH’s associate director for data science, about some of the current Big Data to Knowledge (BD2K) initiative activities. I asked him how they tie together his vision of a digital enterprise for biomedical research and how they might benefit NIGMS grantees.
Phil explained that the goal of his office, commonly referred to as ADDS, is to achieve efficiencies in biomedical research, such as by making it easier for researchers to locate and manipulate data and software. “If we could just achieve a 5 percent improvement in efficiency in research that would be, in NIH budget dollars, more than $150 million a year that could be spent on funding more people and doing more research,” he said.
An active area that we at NIGMS are engaged in with ADDS is sustaining biomedical data resources, of which we support a fair number. As someone who previously set up databases and who now oversees them, I’m very passionate about this topic. A key question is how to sustain support of data resources in the current research budget environment. Led by Phil’s team, NIH has issued a request for information on sustaining biomedical data repositories that seeks input on every aspect of maintaining these resources. I encourage you to share your ideas by the March 18 response date.
Training is important in Phil’s vision for a digital enterprise, too. He told me of a number of recent training activities at NIH, including a “software carpentry” workshop for experimental researchers to learn how to use a wide variety of analysis tools. In a blog post about this and another event, the ADDS office asks for suggestions on other types of data science courses to offer. They want to provide workshops that train more experimentally versed scientists to work with big data and take those skills back to their labs. In addition, the ADDS office is planning to stand up a workforce development center to catalog classroom and online courses in the data sciences.
Another effort that’s in the works is creating a virtual space called the Commons where researchers can share, locate, utilize and cite datasets, software, standards definitions and documentation. Phil anticipates that the first components of the Commons will be available in 2016.
I’m really excited about Phil’s efforts and believe that they will help drive the “data quantum leap” I described in my first Feedback Loop blog post.