Research software citation in the Data Citation Index: Current practices and implications for research software sharing and reuse

Abstract The aim of this study is to explore the phenomenon of research software citation and, in particular, to draw attention to the increasing importance of this form of citation in scholarly communication. This research sheds light on the current status of formal software citation that is captured by citation databases. Data for the study were gathered from more than 67,000 research software records available in public repositories indexed by Clarivate Analytics’ Data Citation Index (DCI). The metadata characteristics of the indexed records and citation data were then analyzed. Research software was rarely cited in the DCI, suggesting that the documented reuse of research software rarely occurs or is not well documented. Institutional repositories attracted few citations and had low rate of citation. It proved impossible, however, using the available data to isolate specific identifiers that can promote formal software citation. The findings presented here offer insights into research software citation that will be of interest to funding agencies, publishers, researchers, and research organizations.

[1]  Lorraine J. Hwang,et al.  Citations for Software: Providing Identification, Access and Recognition for Research Software , 2017, Int. J. Digit. Curation.

[2]  Daniel S. Katz,et al.  Software citation principles , 2016, PeerJ Comput. Sci..

[3]  Alva L. Couch,et al.  NSF Workshop on Supporting Scientific Discovery through Norms and Practices for Software and Data Citation and Attribution , 2015 .

[4]  Jane Greenberg,et al.  Software citation, reuse and metadata considerations: An exploratory study examining LAMMPS , 2016, ASIST.

[5]  Robert Stevens,et al.  bioNerDS: exploring bioinformatics’ database and software use through literature mining , 2013, BMC Bioinformatics.

[6]  Carole A. Goble,et al.  Better Software, Better Research , 2014, IEEE Internet Comput..

[7]  Hailey Mooney,et al.  The Anatomy of a Data Citation: Discovery, Reuse, and Credit , 2012 .

[8]  Darrel C. Ince,et al.  The case for open computer programs , 2012, Nature.

[9]  Hyoungjoo Park,et al.  The Impact of Research Data Sharing and Reuse on Data Citation in STEM Fields. , 2018 .

[10]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[11]  Daniel S. Katz,et al.  Software vs. data in the context of citation , 2016, PeerJ Prepr..

[12]  Hyoungjoo Park,et al.  Informal data citation for data sharing and reuse is more common than formal data citation in biomedical fields , 2018, J. Assoc. Inf. Sci. Technol..

[13]  Kai Li,et al.  How is R cited in research outputs? Structure, impacts, and citation standard , 2017, J. Informetrics.

[14]  Kai Blin,et al.  Ten Simple Rules for Taking Advantage of Git and GitHub , 2014, bioRxiv.

[15]  Arthur E. Kirkpatrick,et al.  Assessing open source software as a scholarly contribution , 2009, Commun. ACM.

[16]  Michael A Timony,et al.  AppCiter: A Web Application for Increasing Rates and Accuracy of Scientific Software Citation. , 2015, Structure.

[17]  M. Martone,et al.  A data citation roadmap for scientific publishers , 2017, Scientific Data.

[18]  Tim Menzies,et al.  Software is data too , 2010, FoSER '10.

[19]  Timothée Poisot Best publishing practices to improve user confidence in scientific software , 2015 .

[20]  Daniel S. Katz Transitive Credit as a Means to Address Social and Technological Concerns Stemming from Citation and Attribution of Digital Products , 2014 .

[21]  James Howison,et al.  Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature , 2016, J. Assoc. Inf. Sci. Technol..

[22]  Cleo Sgouropoulou,et al.  Developing a Metadata Application Profile for Sharing Agricultural Scientific and Scholarly Research Resources , 2011, MTSR.

[23]  Carl Lagoze,et al.  Accommodating Simplicity and Complexity in Metadata: Lessons from the Dublin Core Experience , 2000 .

[24]  Qianqian Wang,et al.  Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers , 2015, J. Informetrics.

[25]  Daniel S. Katz,et al.  slides: Track 1 Paper: Surveying the U.S. National Postdoctoral Association Regarding Software Use and Training in Research , 2017 .

[26]  Hyoungjoo Park,et al.  An examination of research data sharing and re-use: implications for data citation practice , 2017, Scientometrics.

[27]  Erjia Yan,et al.  Examining the usage, citation, and diffusion patterns of bibliometric mapping software: A comparative study of three tools , 2018, J. Informetrics.

[28]  Andrea K. Thomer,et al.  Paratexts and Documentary Practices: Text Mining Authorship and Acknowledgment from a Bioinformatics Corpus , 2014 .

[29]  Daniel S. Katz,et al.  Transitive Credit and JSON-LD , 2015 .

[30]  Xue Wang,et al.  How important is scientific software in bioinformatics research? A comparative study between international and Chinese research communities , 2018, J. Assoc. Inf. Sci. Technol..