Data Curation Education and Biological Information Specialists

Scientific data problems do not stand in isolation but are part of a larger set of challenges associated with the expansion of scientific information gathering capacity and changes in scholarly communication in the digital environment. These problems require new kinds of expertise in key areas, such as ontology development, data federation, and data visualization. However, recent reports on cyberinfrastructure and e-science initiatives acknowledge a shortage of qualified professionals to manage the increasing stores of data across the sciences (NSB, 2005). To build this kind of professional capacity, we have developed two complementary educational programs at the Graduate School of Library and Information Science (GSLIS) at the University of Illinois at Urbana-Champaign. One is the Biological Information Specialist (BIS) Master of Science degree, and the other is a concentration in Data Curation within the Master of Science in Library and Information Science. In this paper we discuss the key features of our Data Curation Education Program (DCEP), our approach to curriculum development, and the data curation program’s contribution to the BIS curriculum. To provide a foundation for course development, we are identifying best practices in data curation, drawing from a variety of resources, including 1) a research foundation of information science projects in the biological sciences related to data curation; (2) a base of domain scientist collaborators; (3) active participation in disciplinary international standards development; (4) an advisory group representing bioinformatics, bench and field biosciences, and the information professions. Curation for Integrative Biological Sciences Scientific data problems are an integral part of the radical changes taking place in the practice of science. New scientific questions, new digital instrumentation, and new levels of interdisciplinary integration are leading to a dramatic increase in both data and derived information. Simultaneously, advances in communication options are leading to a transformation in scholarly communication in the digital environment. For the biological sciences, these complexities are increased because of the need to integrate data across scales. Many biologists are attempting to integrate or at least communicate their findings across scales of size, from molecules to organisms, and across ecosystems. In terms of time scales, some are working to integrate data from sub second molecular interactions through evolutionary time (see, for example, Wooley & Lin, 2005). This work will require new kinds of expertise in key areas, such as ontology development, data federation, and data visualization. However, recent reports on e-research, cyberinfrastructure, and the stewardship of digital assets acknowledge a significant deficit in the workforce required to manage these burgeoning data stores. To address this growing need, with support from the Institute of Museum and Library Services (IMLS), the Graduate School of Library and Information Science (GSLIS) at the University of Illinois at Urbana-Champaign has initiated a