Theory and practice of data citation

Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming “data‐intensive,” where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated data sets. Yet, given a data set, there is no quantitative, consistent, and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded, or what value it has. The development of a theory and practice of data citation is fundamental for considering data as first‐class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation. The current panorama is many‐faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.

[1]  Joan Starr,et al.  isCitedBy: A Metadata Scheme for DataCite , 2011 .

[2]  Hylke Koers How do we make it easy and rewarding for researchers to share their data? A publisher's perspective. , 2016, Journal of clinical epidemiology.

[3]  Daniel Deutch,et al.  A Model for Fine-Grained Data Citation , 2017, CIDR.

[4]  Sue A. Dodd,et al.  Bibliographic references for numeric social science data files: Suggested guidelines , 1979, J. Am. Soc. Inf. Sci..

[5]  Kerstin Helbig,et al.  Supporting Data Citation: Experiences and Best Practices of a DOI Allocation Agency for Social Sciences , 2015 .

[6]  Alva L. Couch,et al.  NSF Workshop on Supporting Scientific Discovery through Norms and Practices for Software and Data Citation and Attribution , 2015 .

[7]  Evaristo Jiménez-Contreras,et al.  Analyzing data citation practices using the data citation index , 2015, J. Assoc. Inf. Sci. Technol..

[8]  Mark Matthews,et al.  Research Data in Journals and Repositories in the Web of Science: Developments and Recommendations , 2016, Bull. IEEE Tech. Comm. Digit. Libr..

[9]  Mercè Crosas,et al.  The Dataverse Network®: An Open-Source Application for Sharing, Discovering and Preserving Data , 2011, D Lib Mag..

[10]  Jee-Hyub Kim,et al.  Database Citation in Full Text Biomedical Articles , 2013, PloS one.

[11]  Carole A. Goble,et al.  Why Linked Data is Not Enough for Scientists , 2010, 2010 IEEE Sixth International Conference on e-Science.

[12]  Christine L. Borgman,et al.  The conundrum of sharing research data , 2012, J. Assoc. Inf. Sci. Technol..

[13]  Matthew S. Mayernik,et al.  Assessing and tracing the outcomes and impact of research infrastructures , 2017, J. Assoc. Inf. Sci. Technol..

[14]  Hyoungjoo Park,et al.  An examination of research data sharing and re-use: implications for data citation practice , 2017, Scientometrics.

[15]  Paolo Manghi,et al.  A vision towards Scientific Communication Infrastructures , 2013, International Journal on Digital Libraries.

[16]  Christine L. Borgman,et al.  Big Data, Little Data, No Data: Scholarship in the Networked World , 2014 .

[17]  Lily Troia,et al.  A Data Citation Roadmap for Scholarly Data Repositories , 2017 .

[18]  Anita de Waard,et al.  Research data management at Elsevier: Supporting networks of data and workflows , 2016, Inf. Serv. Use.

[19]  Kathleen Marie Fear,et al.  Measuring and Anticipating the Impact of Data Reuse. , 2013 .

[20]  Senay Kafkas,et al.  Database citation in supplementary data linked to Europe PubMed Central full text biomedical articles , 2015, J. Biomed. Semant..

[21]  Gianmaria Silvello,et al.  Learning to cite framework: How to automatically construct citations for hierarchical data , 2017, J. Assoc. Inf. Sci. Technol..

[22]  Ben-Ami Lipetz,et al.  Improvement of the selectivity of citation indexes to science literature through inclusion of citation relationship indicators , 1965 .

[23]  Rob W.W. Hooft,et al.  The value of data , 2011, Nature Genetics.

[24]  Mark Gahegan,et al.  Biodiversity data should be published, cited, and peer reviewed. , 2013, Trends in ecology & evolution.

[25]  K. Baggerly Disclose all data in publications. , 2010, Nature.

[26]  Peter Kraker,et al.  Research data explored: an extended analysis of citations and altmetrics , 2016, Scientometrics.

[27]  G. King,et al.  Ensuring the Data-Rich Future of the Social Sciences , 2011, Science.

[28]  Nicola Ferro,et al.  "Data Citation is Coming". Introduction to the Special Issue on Data Citation , 2016, Bull. IEEE Tech. Comm. Digit. Libr..

[29]  Micah Altman,et al.  A Proposed Standard for the Scholarly Citation of Quantitative Data , 2008 .

[30]  Anders M. Dale,et al.  Generalized Laminar Population Analysis (gLPA) for Interpretation of Multielectrode Data from Cortex , 2016, Front. Neuroinform..

[31]  David Stuart,et al.  Data bibliometrics: metrics before norms , 2017, Online Inf. Rev..

[32]  Resource Identification Initiative Members The Resource Identification Initiative: A cultural shift in publishing , 2015 .

[33]  Jens Klump,et al.  DOI for geoscience data - how early practices shape present perceptions , 2016, Earth Science Informatics.

[34]  Nicola Ferro,et al.  Reproducibility Challenges in Information Retrieval Evaluation , 2017, ACM J. Data Inf. Qual..

[35]  H. V. Jagadish Big Data and Science: Myths and Reality , 2015, Big Data Res..

[36]  Veerle Van den Eynden,et al.  Managing and Sharing Research Data: A Guide to Good Practice , 2014 .

[37]  Tristan Henderson,et al.  Data Citation Practices in the CRAWDAD Wireless Network Data Archive , 2015, D Lib Mag..

[38]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[39]  David Croft,et al.  Building models using Reactome pathways as templates. , 2013, Methods in molecular biology.

[40]  Paul E. Uhlir,et al.  For Attribution -- Developing Data Attribution and Citation Practices and Standards: Summary of an International Workshop , 2012 .

[41]  Stefan Gradmann,et al.  From containers to content to context: The changing role of libraries in eScience and eScholarship , 2014, J. Documentation.

[42]  Peter Ingwersen,et al.  Indicators for the Data Usage Index (DUI): an incentive for publishing primary biodiversity data through global information infrastructure , 2011, BMC Bioinformatics.

[43]  Uwe Scholz,et al.  PGP repository: a plant phenomics and genomics data publication infrastructure , 2016, Database J. Biol. Databases Curation.

[44]  Plergiorgio Strata,et al.  Citation analysis , 1995, Nature.

[45]  Ruth E. Duerr,et al.  On the utility of identification schemes for digital earth science data: an assessment and recommendations , 2011, Earth Sci. Informatics.

[46]  M. Mayernik Data citation initiatives and issues , 2012 .

[47]  Hailey Mooney,et al.  The Anatomy of a Data Citation: Discovery, Reuse, and Credit , 2012 .

[48]  CYNTHIA SIMS PARR Open Sourcing Ecological Data , 2007 .

[49]  John L. Pfaltz,et al.  Summary of the final report of the NSF workshop on scientific database management , 1990, SGMD.

[50]  Abdussalam Alawini,et al.  Automating Data Citation: The eagle-i Experience , 2017, 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[51]  C. Renzetti The need for gendered discussions of policy reform. Editor's introduction. , 2014, Violence against women.

[52]  Ryan P. Womack,et al.  Research Data in Core Journals in Biology, Chemistry, Mathematics, and Physics , 2015, PloS one.

[53]  Cassidy R. Sugimoto,et al.  The Ethics of Evaluative Bibliometrics , 2014 .

[54]  Andrea Scharnhorst,et al.  Big Data, Little Data, No Data – Who is in Charge of Data Quality? , 2016 .

[55]  A. H. Ball,et al.  How to Cite Datasets and Link to Publications:A Report of the Digital Curation Centre , 2012 .

[56]  James Frew,et al.  Why data citation is a computational problem , 2016, Commun. ACM.

[57]  Xu Hua,et al.  bioCADDIE white paper - Data Discovery Index , 2015 .

[58]  Nicolas Moreau,et al.  New model for datasets citation and extraction reproducibility in VAMDC , 2016, ArXiv.

[59]  Yaxing Wei,et al.  Implementation of data citations and persistent identifiers at the ORNL DAAC , 2013, Ecol. Informatics.

[60]  G. Gilbert Referencing as Persuasion , 1977 .

[61]  Henning Hermjakob,et al.  A data citation roadmap for scholarly data repositories , 2016, Scientific Data.

[62]  Peter Buneman,et al.  A Rule-Based Citation System for Structured and Evolving Datasets , 2010, IEEE Data Eng. Bull..

[63]  Carole A. Goble,et al.  Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications , 2013, Journal of Biomedical Semantics.

[64]  Loet Leydesdorff,et al.  Theories of citation? , 1998, Scientometrics.

[65]  Gudmundur A Thorisson Accreditation and attribution in data sharing , 2009, Nature Biotechnology.

[66]  M. Martone,et al.  A data citation roadmap for scientific publishers , 2017, Scientific Data.

[67]  Thomas Klein,et al.  ECDS - a Swedish Research Infrastructure for the Open Sharing of Environment and Climate Data , 2013, Data Sci. J..

[68]  David Stuart,et al.  Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact , 2015, Online information review (Print).

[69]  Cristina Ribeiro,et al.  A comparison of research data management platforms: architecture, flexible metadata and interoperability , 2017, Universal Access in the Information Society.

[70]  David N. Kennedy,et al.  The Resource Identification Initiative: A cultural shift in publishing , 2015, Neuroinformatics.

[71]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[72]  Brigitte Mathiak,et al.  Challenges in Matching Dataset Citation Strings to Datasets in Social Science , 2015, D Lib Mag..

[73]  Sarah Callaghan,et al.  Joint declaration of data citation principles , 2014 .

[74]  Matthew S. Mayernik,et al.  Linking Publications and Data: Challenges, Trends, and Opportunities , 2016, D Lib Mag..

[75]  Gary King,et al.  Automating Open Science for Big Data , 2015 .

[76]  Blaise Cronin,et al.  The Need for a Theory of citing , 1981, J. Documentation.

[77]  Susan E. Cozzens,et al.  Taking the Measure of Science: A Review of Citation Theories , 1981 .

[78]  Mercè Crosas,et al.  The Evolution of Data Citation: From Principles to Implementation , 2014 .

[79]  Brian A. Nosek,et al.  Promoting an open research culture , 2015, Science.

[80]  Keith McNaught,et al.  The Changing Publication Practices in Academia: Inherent Uses and Issues in Open Access and Online Publishing and the Rise of Fraudulent Publications , 2015 .

[81]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[82]  Erik Schultes,et al.  Nanopublications for exposing experimental data in the life-sciences: a Huntington’s Disease case study , 2015, Journal of Biomedical Semantics.

[83]  Heather A. Piwowar,et al.  Data reuse and the open data citation advantage , 2013, PeerJ.

[84]  Carlo Torniai,et al.  eagle-i: Biomedical research resource datasets , 2015, Semantic Web.

[85]  Micah Altman,et al.  An introduction to the joint principles for data citation , 2015 .

[86]  C. O. Frost The Use of Citations in Literary Research: A Preliminary Classification of Citation Functions , 1979, The Library Quarterly.

[87]  Christine L Borgman,et al.  Why are the attribution and citation of scientific data important? In: Uhlir, Paul and Cohen, Daniel (eds.). Report from Developing Data Attribution and Citation Practices and Standards: An International Symposium and Workshop. , 2012 .

[88]  Matthew S. Mayernik,et al.  Peer Review of Datasets: When, Why, and How , 2015 .

[89]  Sarah Callaghan,et al.  Making Data a First Class Scientific Output: Data Citation and Publication by NERC's Environmental Data Centres , 2012, Int. J. Digit. Curation.

[90]  Juliana Freire,et al.  Reproducibility of Data-Oriented Experiments in e-Science (Dagstuhl Seminar 16041) , 2016, Dagstuhl Reports.

[91]  Ruth E. Duerr,et al.  Achieving human and machine accessibility of cited data in scholarly publications , 2015, PeerJ Comput. Sci..

[92]  E. Garfield When to Cite , 1996, The Library Quarterly.

[93]  C. Tenopir,et al.  Data Sharing by Scientists: Practices and Perceptions , 2011, PloS one.

[94]  Vassilis Christophides,et al.  Algebraic structures for capturing the provenance of SPARQL queries , 2013, ICDT '13.

[95]  María Fernanda Cabrera-Umpiérrez,et al.  3rd Generation accessibility: information and communication technologies towards universal access , 2014, Universal Access in the Information Society.

[96]  Chavan Vishwas Data Citation Mechanism and Services for Scientific Data: Defining Framework for Biodiversity Data Publishers , 2013 .

[97]  Toby Green,et al.  We need publishing standards for datasets and data tables , 2009, Learn. Publ..

[98]  Yvonne M. Socha,et al.  OUT OF CITE, OUT OF MIND: THE CURRENT STATE OF PRACTICE, POLICY, AND TECHNOLOGY FOR THE CITATION OF DATA CODATA-ICSTI Task Group on Data Citation Standards and Practices , 2013 .

[99]  Joseph A. Hourclé Advancing the practice of data citation: A to‐do list , 2012 .

[100]  Natasha Simons,et al.  Growing Institutional Support for Data Citation: Results of a Partnership Between Griffith University and the Australian National Data Service , 2013, D Lib Mag..

[101]  Daniel Deutch,et al.  Data Citation: A Computational Challenge , 2017, PODS.

[102]  Andrea Scharnhorst,et al.  Enhancing Scholarly Publications: Developing Hybrid Monographs in the Humanities and Social Sciences , 2012 .

[103]  Christopher W. Belter,et al.  Measuring the Value of Research Data: A Citation Analysis of Oceanographic Data Sets , 2014, PloS one.

[104]  Dieter Van Uytvanck,et al.  Identification of Reproducible Subsets for Data Citation, Sharing and Re-Use , 2016, Bull. IEEE Tech. Comm. Digit. Libr..

[105]  Suzanne J. Matthews,et al.  Paper Mâché: Creating Dynamic Reproducible Science , 2011, ICCS.

[106]  B Walter,et al.  [In citation process]. , 2010, Der Radiologe.

[107]  Anne Cambon-Thomsen,et al.  Developing a guideline to standardize the citation of bioresources in journal articles (CoBRA) , 2015, BMC Medicine.

[108]  Mark John Costello Motivating Online Publication of Data , 2009 .

[109]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[110]  Peter Fox,et al.  Is Data Publication the Right Metaphor? , 2013, Data Sci. J..

[111]  Loet Leydesdorff,et al.  Theories of Citation , 1998 .

[112]  Howard D. White Citation Analysis of Data File Use , 1982 .

[113]  Evaristo Jiménez-Contreras,et al.  Analyzing data citation practices according to the Data Citation Index , 2015, ArXiv.

[114]  Gianmaria Silvello A Methodology for Citing Linked Open Data Subsets , 2015, D Lib Mag..

[115]  E. Garfield,et al.  Citation data: their use as quantitative indicators for science and technology evaluation and policy-making , 1992 .

[116]  Christian S. Jensen,et al.  SQL-Based Temporal Query Languages , 2009, Encyclopedia of Database Systems.

[117]  Ryan Shaw,et al.  Nanopublication beyond the sciences: the PeriodO period gazetteer , 2016, PeerJ Comput. Sci..

[118]  J. Lechner-Scott,et al.  Persistence on Therapy and Propensity Matched Outcome Comparison of Two Subcutaneous Interferon Beta 1a Dosages for Multiple Sclerosis , 2013, PloS one.

[119]  David N. Kennedy,et al.  Data Citation in Neuroimaging: Proposed Best Practices for Data Identification and Attribution , 2016, Front. Neuroinform..

[120]  Maarten Hoogerwerf,et al.  Enhanced Publications : Linking Publications and Research Data in Digital Repositories , 2009 .

[121]  Yi-Hung Huang,et al.  Citing a Data Repository: A Case Study of the Protein Data Bank , 2015, PloS one.

[122]  Patricia Herterich,et al.  Data Citation Services in the High-Energy Physics Community , 2016, D Lib Mag..

[123]  Joan E. Sieber,et al.  (Not) giving credit where credit is due: Citation of data sets , 1995 .

[124]  Daniel S. Katz,et al.  The Challenge and Promise of Software Citation for Credit, Identification, Discovery, and Reuse , 2016, ACM J. Data Inf. Qual..

[125]  Paolo Manghi,et al.  Enhanced Publications: Data Models and Information Systems , 2014 .

[126]  Brian Hole,et al.  Adventures in data citation: sorghum genome data exemplifies the new gold standard , 2012, BMC Research Notes.

[127]  Fiona Murphy,et al.  A Data Citation Roadmap for Scientific Publishers , 2017 .

[128]  Christine L. Borgman,et al.  Data citation as a bibliometric oxymoron , 2016 .

[129]  Henry Small,et al.  Cited Documents as Concept Symbols , 1978 .

[130]  Chris Maloney,et al.  Adapting JATS to support data citation , 2015 .

[131]  Andreas Rauber,et al.  Scalable data citation in dynamic, large databases: Model and reference implementation , 2013, 2013 IEEE International Conference on Big Data.

[132]  Steve Pettifer,et al.  Utopia documents: linking scholarly literature with research data , 2010, Bioinform..

[133]  Hylke B. J. Koers,et al.  Elsevier's Article of the Future enhancing the user experience and integrating data through applications , 2012 .

[134]  Joan Starr,et al.  isCitedBy: A Metadata Scheme for DataCite , 2011, D Lib Mag..

[135]  M. A. Parsons How to cite an Earth science data set , 2011 .

[136]  Paolo Manghi,et al.  Data journals: A survey , 2014, J. Assoc. Inf. Sci. Technol..

[137]  Christine L. Borgman,et al.  Data, data use, and scientific inquiry: two case studies of data practices , 2012, JCDL '12.

[138]  Micah Altman,et al.  A Proposed Standard for the Scholarly Citation of Quantitative Data , 2007, IASSIST Conference.

[139]  Benedikt Fecher,et al.  What Drives Academic Data Sharing? , 2014, PloS one.

[140]  Blaise Cronin,et al.  The citation process: The role and significance of citations in scientific communication , 1984 .

[141]  Anne E. Trefethen,et al.  Cyberinfrastructure for e-Science , 2005, Science.

[142]  Keishi Tajima,et al.  Archiving scientific data , 2004, TODS.