Much has been published in the past ten years about the explosion of research data in almost every academic field and particularly science, as evidenced by the growth of new computational research disciplines that generate and consume petabytes of data. Research data in the context discussed here includes many varieties: observational and experimental data, statistical models and simulations, still and moving images, three-dimensional models, compilations of multimedia files, and so on. Often research data is aggregated into databases to support queries and analysis, visualizations and other types of post-processing. The scale of data sets ranges from small spreadsheets collected by individual researchers to multi-terabyte files containing high-resolution climate model runs, genome-wide association studies, or large functional magnetic resonance neuroimaging experiments.1 Over the past thirty-odd years the growth of computational science and high-performance computing has raised the stakes for data-intensive research to new levels.2 That trend, combined with increasing pressure from government and other research funders to make the products of their funding more accessible to other researchers and the public, have increased expectations that researchers should freely share their research data and ensure its preservation for future uses not necessarily envisioned by its originator. With this new expectation, research data becomes another mode of scholarly communication alongside books, articles, conference proceedings, etc., taking its place as part of the scholarly record with all that that implies. However, research data is often unlike the static publications we normally think of as scholarly communication. Data does not lend itself to neat, well-defined packaging, it can grow and change over time, and it develops complex relationships to other data. A few examples of current attempts to elevate data to a useful mode of scholarly communication include: Linked Open Data,3 the Concept Web Alliance’s ‘nano-publications’,4 and experiments with ‘dataenhanced PDFs’.5 But whether data proves to be a separate communication channel or part of an integrated fabric of channels, it will require very different technologies, policies and services to manage it over time. Preserving the literature entails keeping print copies in safe place or standardizing digital text encoding to make it possible to migrate its encoding over time. Preserving digital data is much more complicated since there is no equivalent of ‘text’. But if scholarship is changing to include primary data as a new mode of communication, this raises the question about the roles of traditional players in scholarly communication, namely researchers, scholarly societies, publishers, and libraries. How will these roles change to respond to data publishing? Of course, researchers have always generated data in the course of their work, and often reused that data themselves or aggregated it with additional data to support further research. There has also been a presumption by the research community that data would be made available on request from its creator, for example to validate research results. Most researchers have kept their data around for as long as they felt it had continuing value, but have done so in ad hoc ways with mixed success. A small number of disciplines have developed very sophisticated infrastructure for managing research data: high-energy Communicating with data: new roles for scientists, publishers and librarians 203
[1]
Paul Groth.
The Anatomy of a Nano-publication
,
2010
.
[2]
Raym Crow,et al.
The case for institutional repositories : a SPARC position paper
,
2002
.
[3]
Terrence B. Bennett,et al.
Data Sharing: Academic Libraries and the Scholarly Enterprise
,
2011
.
[4]
John Kunze,et al.
Practices, Trends, and Recommendations in Technical Appendix Usage for Selected Data-Intensive Disciplines
,
2011
.
[5]
Tony Hey,et al.
The Fourth Paradigm: Data-Intensive Scientific Discovery
,
2009
.
[6]
Tracy Gabridge.
The Last Mile: Liaison Roles in Curating Science and Engineering Research Data
,
2009
.
[7]
Jonathan Rees.
Recommendations for independent scholarly publication of data sets
,
2010
.