Co-mention network of R packages: Scientific impact and clustering structure

Despite its rising position as a first-class research object, scientific software remains a marginal object in studies of scholarly communication. This study aims to fill the gap by examining the co-mention network of R packages across all Public Library of Science (PLoS) journals. To that end, we developed a software entity extraction method and identified 14,310 instances of R packages across the 13,684 PLoS journal papers mentioning or citing R. A paper-level co-mention network of these packages was visualized and analyzed using three major centrality measures: degree centrality, betweenness centrality, and PageRank. We analyzed the distributive patterns of R packages in all PLoS papers, identified the top packages mentioned in these papers, and examined the clustering structure of the network. Specifically, we found that the discipline and function of the packages can partly explain the largest clusters. The present study offers the first large-scale analysis of R packages’ extensive use in scientific research. As such, it lays the foundation for future explorations of various roles played by software packages in the scientific enterprise.

[1]  Vince Grolmusz,et al.  A note on the PageRank of undirected graphs , 2012, Inf. Process. Lett..

[2]  Ray J. Paul,et al.  Visualizing a Knowledge Domain's Intellectual Structure , 2001, Computer.

[3]  Gobinda G. Chowdhury,et al.  Journal as Markers of Intellectual Space: Journal Co-Citation Analysis of Information Retrieval Area, 1987–1997 , 2004, Scientometrics.

[4]  Ying Ding,et al.  Applying centrality measures to impact analysis: A coauthorship network analysis , 2009 .

[5]  Katherine W. McCain,et al.  Visualizing a discipline: an author co-citation analysis of information science, 1972–1995 , 1998 .

[6]  R. Hanneman Introduction to Social Network Methods , 2001 .

[7]  Manas A. Pathak,et al.  Beginning Data Science with R , 2014 .

[8]  Henry G. Small,et al.  Specialties and disciplines in science and social science: An examination of their structure using citation indexes , 1979, Scientometrics.

[9]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[10]  Rongying Zhao,et al.  Impact evaluation of open source software: an Altmetrics perspective , 2016, Scientometrics.

[11]  Maria Grazia Pia,et al.  Publication patterns in HEP computing , 2012, ArXiv.

[12]  John M. Swales,et al.  Genre Analysis: English in Academic and Research Settings , 1993 .

[13]  Jane Greenberg,et al.  Software citation, reuse and metadata considerations: An exploratory study examining LAMMPS , 2016, ASIST.

[14]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[15]  David M. Berry,et al.  The Philosophy of Software: Code and Mediation in the Digital Age , 2011 .

[16]  Robert A. Muenchen,et al.  The Popularity of Data Analysis Software , 2013 .

[17]  Stephen D. Miller,et al.  Confirmation of independent introductions of an exotic plant pathogen of Cornus species, Discula destructiva, on the east and west coasts of North America , 2017, PloS one.

[18]  Chaomei Chen,et al.  Visualising Semantic Spaces and Author Co-Citation Networks in Digital Libraries , 1999, Inf. Process. Manag..

[19]  Daniel S. Katz,et al.  Transitive Credit and JSON-LD , 2015 .

[20]  Nina Zumel,et al.  Practical Data Science with R , 2014 .

[21]  Sylvia Tippmann,et al.  Programming tools: Adventures with R , 2014, Nature.

[22]  Darrel C. Ince,et al.  The case for open computer programs , 2012, Nature.

[23]  S. Fortunato,et al.  Spectral centrality measures in complex networks. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Zao Liu,et al.  Visualizing the intellectual structure in urban studies: A journal co-citation analysis (1992-2002) , 2005, Scientometrics.

[25]  S. Woolgar,et al.  The Manufacture of Knowledge: an Essay on the Constructivist and Contextual Nature of Science , 1982 .

[26]  K. Ruhleder ‘Pulling down’ Books vs. ‘Pulling Up’ Files: Textual Databanks and the Changing Culture of Classical Scholarship , 1994 .

[27]  Johan Bollen,et al.  Co-authorship networks in the digital library research community , 2005, Inf. Process. Manag..

[28]  Sridhar P. Nerur,et al.  The intellectual structure of the strategic management field: an author co‐citation analysis , 2008 .

[29]  Cornelius Le Pair,et al.  Displaying strengths and weaknesses in national R&D performance through document cocitation , 1985, Scientometrics.

[30]  Hadley Wickham,et al.  R Packages , 2015 .

[31]  Kevin Driscoll,et al.  Big Data, Big Questions| Working Within a Black Box: Transparency in the Collection and Production of Big Twitter Data , 2014 .

[32]  Ludo Waltman,et al.  Vos: A New Method for Visualizing Similarities between Objects , 2006, GfKl.

[33]  Daniel T. Kaplan,et al.  Modern Data Science with R , 2017 .

[34]  Tom Mens,et al.  On the Development and Distribution of R Packages: An Empirical Analysis of the R Ecosystem , 2015, ECSA Workshops.

[35]  Daniel M. Germán,et al.  The Evolution of the R Software Ecosystem , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[36]  Erjia Yan,et al.  Disciplinary differences of software use and impact in scientific literature , 2016, Scientometrics.

[37]  Jianhua Hou,et al.  Countries Co-citation Network and Research Fronts of International Energy Technology , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[38]  Ludo Waltman,et al.  Software survey: VOSviewer, a computer program for bibliometric mapping , 2009, Scientometrics.

[39]  Maria Grazia Pia,et al.  Geant4 in scientific literature , 2009, 2009 IEEE Nuclear Science Symposium Conference Record (NSS/MIC).

[40]  Scott Chamberlain,et al.  Building Software, Building Community: Lessons from the rOpenSci Project , 2014 .

[41]  Ronald Rousseau,et al.  Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient , 2003, J. Assoc. Inf. Sci. Technol..

[42]  Heather A. Piwowar,et al.  Toward a comprehensive impact report for every software project , 2013 .

[43]  Ben Marwick,et al.  Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation , 2016, Journal of Archaeological Method and Theory.

[44]  Hong Xu,et al.  Journal co-citation analysis of semiconductor literature , 2003, Scientometrics.

[45]  Markus Gmür,et al.  Co-citation analysis and the search for invisible colleges: A methodological evaluation , 2004, Scientometrics.

[46]  Howard D. White,et al.  Author cocitation: A literature measure of intellectual structure , 1981, J. Am. Soc. Inf. Sci..

[47]  Tom Mens,et al.  On the maintainability of CRAN packages , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[48]  James Howison,et al.  Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature , 2016, J. Assoc. Inf. Sci. Technol..

[49]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[50]  Daniel S. Katz,et al.  Software citation principles , 2016, PeerJ Comput. Sci..

[51]  Kai Li,et al.  How is R cited in research outputs? Structure, impacts, and citation standard , 2017, J. Informetrics.

[52]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[53]  Tom Mens,et al.  ECOS: Ecological studies of open source software ecosystems , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[54]  Qianqian Wang,et al.  Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers , 2015, J. Informetrics.

[55]  J. Maloof,et al.  Network Analysis Identifies ELF3 as a QTL for the Shade Avoidance Response in Arabidopsis , 2010, PLoS genetics.

[56]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[57]  Katherine W. McCain,et al.  Mapping authors in intellectual space: A technical overview , 1990, J. Am. Soc. Inf. Sci..

[58]  Karen Ruhleder,et al.  Reconstructing Artifacts, Reconstructing Work: From Textual Edition to On-Line Databank , 1995 .

[59]  Maria Grazia Pia,et al.  The impact of Monte Carlo simulation: a scientometric analysis of scholarly literature , 2010 .

[60]  Vince Grolmusz,et al.  When the Web meets the cell: using personalized PageRank for analyzing protein interaction networks , 2011, Bioinform..

[61]  Yan Gao,et al.  A journal co-citation analysis of library and information science in China , 2011, Scientometrics.

[62]  Christopher D. Manning,et al.  SPIED: Stanford Pattern based Information Extraction and Diagnostics , 2014 .

[63]  Henry G. Small,et al.  Clustering the science citation index using co-citations. II. Mapping science , 1985, Scientometrics.

[64]  Min Song,et al.  Entitymetrics: Measuring the Impact of Entities , 2013, PloS one.