Adding software to package management systems can increase their citation by 280%

A growing number of biomedical methods and protocols are being disseminated as open-source software packages. When put in concert with other packages, they can execute in-depth and comprehensive computational pipelines. Therefore, their integration with other software packages plays a prominent role in their adoption in addition to their availability. Accordingly, package management systems are developed to standardize the discovery and integration of software packages. Here we study the impact of package management systems on software dissemination and their scholarly recognition. We study the citation pattern of more than 18,000 scholarly papers referenced by more than 23,000 software packages hosted by Bioconda, Bioconductor, BioTools, and ToolShed—the package management systems primarily used by the Bioinformatics community. Our results suggest that there is significant evidence that the scholarly papers’ citation count increases after their respective software was published to package management systems. Additionally, our results show that the impact of different package management systems on the scholarly papers’ recognition is of the same magnitude. These results may motivate scientists to distribute their software via package management systems, facilitating the composition of computational pipelines and helping reduce redundancy in package development. Significance Statement Software packages are the building blocks of computational pipelines. A myriad of packages are developed; however, the lack of integration and discovery standards hinders their adoption, leaving most scientists’ scholarly contributions unrecognized. Package management systems are developed to facilitate software dissemination and integration. However, developing software to meet their code and packaging standards is an involved process. Therefore, our study results on the significant impact of the package management systems on scholarly paper’s recognition can motivate scientists to invest in disseminating their software via package management systems. Dissemination of more software via package management systems will lead to a more straightforward composition of computational pipelines and less redundancy in software packages.

[1]  Ludo Waltman,et al.  Predicting the long-term citation impact of recent publications , 2015, J. Informetrics.

[2]  Qianqian Wang,et al.  Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers , 2015, J. Informetrics.

[3]  Juan Gorraiz,et al.  Availability of digital object identifiers (DOIs) in Web of Science and Scopus , 2016, J. Informetrics.

[4]  Anton Nekrutenko,et al.  Dissemination of scientific software with Galaxy ToolShed , 2014, Genome Biology.

[5]  Changsheng Li,et al.  On Modeling and Predicting Individual Paper Citation Count over Time , 2016, IJCAI.

[6]  Karim R. Lakhani,et al.  Looking Across and Looking Beyond the Knowledge Frontier: Intellectual Distance, Novelty, and Resource Allocation in Science , 2016, Manag. Sci..

[7]  Daniel S. Katz,et al.  Software citation principles , 2016, PeerJ Comput. Sci..

[8]  Lutz Bornmann,et al.  Selecting scientific excellence through committee peer review - A citation analysis of publications previously published to approval or rejection of post-doctoral research fellowship applicants , 2006, Scientometrics.

[9]  Enrique Orduña-Malea,et al.  Methods for estimating the size of Google Scholar , 2014, Scientometrics.

[10]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[11]  Xiaomei Bai,et al.  Predicting the citations of scholarly paper , 2019, J. Informetrics.

[12]  Anne-Wil Harzing,et al.  Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison , 2015, Scientometrics.

[13]  Michael Golosovsky,et al.  Runaway events dominate the heavy tail of citation distributions , 2012, ArXiv.

[14]  Enrique Orduña-Malea,et al.  Google Scholar as a data source for research assessment , 2018, Springer Handbook of Science and Technology Indicators.

[15]  Carl T. Bergstrom,et al.  The Science of Science , 2018, Science.

[16]  James P. Bagrow,et al.  Understanding the group dynamics and success of teams , 2014, Royal Society Open Science.

[17]  Santo Fortunato,et al.  Impact Factor : tracking the dynamics of individual scientific impact , 2014 .

[18]  ANTHONY F. J. VAN RAAN,et al.  Sleeping Beauties in science , 2004, Scientometrics.

[19]  Kevin W. Boyack,et al.  Toward predicting research proposal success , 2018, Scientometrics.

[20]  Ludo Waltman,et al.  A review of the literature on citation impact indicators , 2015, J. Informetrics.

[21]  Ludo Waltman,et al.  Large-scale comparison of bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic , 2020, Quantitative Science Studies.

[22]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[23]  Rafael Aleixandre-Benavent,et al.  A systematic analysis of duplicate records in Scopus , 2015, J. Informetrics.

[24]  Pierre Alliez,et al.  Attributing and Referencing (Research) Software: Best Practices and Outlook From Inria , 2019, Computing in Science & Engineering.

[25]  Chao Long,et al.  Comparing keywords plus of WOS and author keywords: A case study of patient adherence research , 2016, J. Assoc. Inf. Sci. Technol..

[26]  Albert-László Barabási,et al.  Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes , 2014, AAAI.

[27]  Mike Thelwall,et al.  Google Scholar, Web of Science, and Scopus: a systematic comparison of citations in 252 subject categories , 2018, J. Informetrics.

[28]  Kayvan Kousha,et al.  Web of Science and Scopus language coverage , 2019, Scientometrics.

[29]  Qing Ke,et al.  Defining and identifying Sleeping Beauties in science , 2015, Proceedings of the National Academy of Sciences.

[30]  James Howison,et al.  Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature , 2016, J. Assoc. Inf. Sci. Technol..

[31]  Mike Thelwall,et al.  The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression , 2016, J. Informetrics.

[32]  Richard Van Noorden,et al.  Metrics: A profusion of measures. , 2010, Nature.

[33]  Lei Wang,et al.  Three options for citation tracking: Google Scholar, Scopus and Web of Science , 2006, Biomedical digital libraries.

[34]  Albert-László Barabási,et al.  Quantifying Long-Term Scientific Impact , 2013, Science.

[35]  Harry Eugene Stanley,et al.  Reputation and impact in academic careers , 2013, Proceedings of the National Academy of Sciences.

[36]  R. Merton The Matthew effect in science. The reward and communication systems of science are considered. , 1968, Science.

[37]  Johan Bollen,et al.  Quantifying perceived impact of scientific publications , 2016, J. Informetrics.

[38]  Erin E Leahey,et al.  Sociological Innovation through Subfield Integration , 2014 .

[39]  Amber Williams,et al.  Sleeping Beauties of Science. , 2015, Scientific American.

[40]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[41]  David T Cooke,et al.  Does Tweeting Improve Citations? One-Year Results from the TSSMN Prospective Randomized Trial. , 2020, The Annals of thoracic surgery.

[42]  Nees Jan van Eck,et al.  Evaluation of the citation matching algorithms of CWTS and iFQ in comparison to the Web of science , 2015, J. Assoc. Inf. Sci. Technol..

[43]  Jeremy C Wyatt,et al.  Peer review of health research funding proposals: A systematic map and systematic review of innovations for effectiveness and efficiency , 2018, PloS one.

[44]  R. L. Thorndike Who belongs in the family? , 1953 .

[45]  Peder Olesen Larsen,et al.  The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index , 2010, Scientometrics.

[46]  Adèle Paul-Hus,et al.  The journal coverage of Web of Science and Scopus: a comparative analysis , 2015, Scientometrics.