Data publication consensus and controversies

The movement to bring datasets into the scholarly record as first class research products (validated, preserved, cited, and credited) has been inching forward for some time, but now the pace is quickening. As data publication venues proliferate, significant debate continues over formats, processes, and terminology. Here, we present an overview of data publication initiatives underway and the current conversation, highlighting points of consensus and issues still in contention. Data publication implementations differ in a variety of factors, including the kind of documentation, the location of the documentation relative to the data, and how the data is validated. Publishers may present the data as supplemental material to a journal article, with a descriptive "data paper," or independently. Complicating the situation, different initiatives and communities use the same terms to refer distinct but overlapping concepts. For instance, the term "published" means that the data is publicly available and citable to virtually everyone, but it may or may not imply that the data has been peer-reviewed. In turn, what is meant by data peer review is far from defined; standards and processes encompass the full range employed in reviewing the literature, plus some novel variations. Basic data citation is a point of consensus, but the general agreement on the core elements of a dataset citation frays if the data is dynamic or part of a larger set. Even as data publication is being defined, some are looking past publication to other metaphors, notably "data as software," for solutions to the more stubborn problems.

[1]  Heather A. Piwowar,et al.  Data archiving is a good investment , 2011, Nature.

[2]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[3]  Jennifer M. Schopf Treating data like software: a case for production quality data , 2012, JCDL '12.

[4]  Walter G. Berendsohn,et al.  Strategies for the sustainability of online open-access biodiversity databases , 2014 .

[5]  Micah Altman,et al.  A Proposed Standard for the Scholarly Citation of Quantitative Data , 2008, IASSIST Conference.

[6]  Ron Edgar,et al.  NCBI GEO standards and services for microarray data , 2006, Nature Biotechnology.

[7]  Jelte M. Wicherts,et al.  Psychology must learn a lesson from fraud case , 2011, Nature.

[8]  Jeff van Egmond,et al.  Peer-Reviewed Open Research Data: Results of a Pilot , 2012, Int. J. Digit. Curation.

[9]  Neil Beagrie,et al.  Digital Curation for Science, Digital Libraries, and Individuals , 2008, Int. J. Digit. Curation.

[10]  O. Kinne,et al.  Marine ecology progress series , 1989 .

[11]  W. Levelt,et al.  Flawed science: The fraudulent research practices of social psychologist Diederik Stapel , 2012 .

[12]  Alexander S. Szalay,et al.  Online scientific data curation, publication, and archiving , 2002, SPIE Astronomical Telescopes + Instrumentation.

[13]  J. Maunsell Announcement Regarding Supplemental Material , 2010 .

[14]  R. Vose,et al.  An Overview of the Global Historical Climatology Network-Daily Database , 2012 .

[15]  Ruth E. Duerr,et al.  Data Citation and Peer Review , 2010 .

[16]  Sarah Whitcher Kansa,et al.  We All Know That a 14 Is a Sheep: Data Publication and Professionalism in Archaeological Communication , 2013 .

[17]  John Wieczorek,et al.  Best practice for biodiversity data management and publication , 2014 .

[18]  Michael Y. Galperin,et al.  The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection , 2013, Nucleic Acids Res..

[19]  P. Bryan Heidorn,et al.  Shedding Light on the Dark Data in the Long Tail of Science , 2008, Libr. Trends.

[20]  Vincent S Smith,et al.  Data publication: towards a database of everything , 2009, BMC Research Notes.

[21]  Sarah Callaghan,et al.  Making Data a First Class Scientific Output: Data Citation and Publication by NERC's Environmental Data Centres , 2012, Int. J. Digit. Curation.

[22]  J. Ioannidis,et al.  Public Availability of Published Research Data in High-Impact Journals , 2011, PloS one.

[23]  Peter Fox,et al.  Is Data Publication the Right Metaphor? , 2013, Data Sci. J..

[24]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[25]  Anthony J. G. Hey,et al.  Jim Gray on eScience: a transformed scientific method , 2009, The Fourth Paradigm.

[26]  Peter Corke,et al.  Editorial: Data Papers - Peer Reviewed Publication of High Quality Data Sets , 2009, Int. J. Robotics Res..

[27]  Kimberly Van Auken,et al.  WormBase 2014: new views of curated biology , 2013, Nucleic Acids Res..

[28]  Hailey Mooney,et al.  The Anatomy of a Data Citation: Discovery, Reuse, and Credit , 2012 .

[29]  René van Horik,et al.  Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud? , 2013, Int. J. Digit. Curation.

[30]  A. H. Ball,et al.  How to Cite Datasets and Link to Publications:A Report of the Digital Curation Centre , 2012 .

[31]  F. Collins,et al.  Policy: NIH plans to enhance reproducibility , 2014, Nature.

[32]  A. Sugden Trends in ecology and evolution. , 1986, Trends in ecology & evolution.

[33]  Hans Pfeiffenberger "Earth System Science Data" (ESSD) - A Peer Reviewed Journal for Publication of Data , 2011, D Lib Mag..

[34]  H. Pashler,et al.  Is the Replicability Crisis Overblown? Three Arguments Examined , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[35]  Henry M. Gladney,et al.  Preserving digital information , 2007 .

[36]  E. Hayden NIH shutdown effects multiply , 2013 .

[37]  Florence Debarre,et al.  The Availability of Research Data Declines Rapidly with Article Age , 2013, Current Biology.

[38]  Daphne J Fairbairn,et al.  THE ADVENT OF MANDATORY DATA ARCHIVING , 2011, Evolution; international journal of organic evolution.

[39]  Nikolaus Kriegeskorte,et al.  An emerging consensus for open evaluation: 18 visions for the future of scientific publishing , 2012, Front. Comput. Neurosci..

[40]  Bernd Pulverer,et al.  A transparent black box , 2010, The EMBO journal.

[41]  Mark Gahegan,et al.  Biodiversity data should be published, cited, and peer reviewed. , 2013, Trends in ecology & evolution.

[42]  Keith A. Crandall,et al.  Lost Branches on the Tree of Life , 2013, PLoS biology.

[43]  Karthik Ram,et al.  Git can facilitate greater reproducibility and increased transparency in science , 2013, Source Code for Biology and Medicine.

[44]  Daniel M. Herron,et al.  Is expert peer review obsolete? A model suggests that post-publication reader review may exceed the accuracy of traditional peer review , 2012, Surgical Endoscopy.

[45]  Sarah Callaghan,et al.  Citation and Peer Review of Data: Moving Towards Formal Data Publication , 2011, Int. J. Digit. Curation.

[46]  Joan Starr,et al.  isCitedBy: A Metadata Scheme for DataCite , 2011, D Lib Mag..

[47]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[48]  Laura Ponting,et al.  FlyBase 102—advanced approaches to interrogating FlyBase , 2013, Nucleic Acids Res..

[49]  Suzanne K. Linder,et al.  A Survey on Data Reproducibility in Cancer Research Provides Insights into Our Limited Ability to Translate Findings from the Laboratory to the Clinic , 2013, PloS one.