An uncertain future for biological databases
An article in the most recent issue of Science highlights a growing concern about the continued support of the biological databases on which our community depends. Indeed, 2015 GSA President Jasper Rine was quoted as saying these resources are “critical for our daily life as geneticists and biomedical researchers.”
Many of the model organism databases (MODs) used by members of the GSA community—including FlyBase, WormBase, SGD, ZFIN, and MGI—have been supported by NIH’s National Human Genome Research Institute (NHGRI), along with others supporting human and other research—such as OMIM, the Gene Ontology Consortium, and UniProt. In data compiled by Science, NHGRI provided more than $17 million in funding to support the MODs that serve the C. elegans, Drosophila, mouse, yeast, and zebrafish communities, part of the $30 million it spends on all databases.
But as the data cataloged in these resources continues to extend beyond genomes, NHGRI is concerned that it is bearing all of the costs. As NHGRI Director Eric Green was quoted, “we’re not a good long-term home.” And with the continuing explosion in the amount of biological data and increasing pressures on funding, the challenges are only growing.
NIH is thinking about other funding models, including ways to recruit financial support from other sources, encourage infrastructure sharing among databases, or consider charging for the use of the databases. For example, when the National Science Foundation eliminated funding for The Arabidopsis Information Resource (TAIR), it was forced to move toward a subscription model.
GSA member Janan Eppig, who is PI for the Mouse Genome Database, and Monte Westerfield, who runs the Zebrafish Model Organism Database, are concerned about this precedent. They worry that access to the data will be restricted under subscription paywalls and make it even more difficult to link to data across databases.
This past summer, Rine participated in a meeting convened by NHGRI to discuss the future of these databases, and GSA has also worked to draw attention to this issue by other NIH institutes. The sustainability of MODs has also been a frequent topic of conversation between GSA and both NHGRI and the National Institute of General Medical Sciences (NIGMS). Indeed, NIGMS Director Jon Lorsch has stressed the importance of treating such resources as infrastructure, rather than research grants.
GSA has also emphasized the importance of the model organism databases in response to several NIH requests for information. For example, in a March 2015 comment, GSA indicated that these resources should not be forced to be self-sustaining as they are essential to the overall research enterprise and provide value significantly greater than the sum of the value to individual investigators. GSA also cautioned against a subscription model, saying that “even a small fee would be a disincentive for accessing validated and current data. It would also have a negative impact on the use of such resources by scientists working on other organisms—often motivated by hunches from other MODs—more casual users or those who lack significant funds, such as researchers at under-resourced institutions and those using such data repositories for educational purposes.”
In the long run, the National Library of Medicine (NLM) might become the home for such databases, as part of its mission to serve as the datascience hub at NIH. A search for the new NLM director is underway, with Green and Lorsch chairing the search committee.
- Jocelyn Kaiser, “Funding for key data resources in jeopardy,” Science 351: 14.