Academic software downloads from Google Code: useful usage indicators?

Introduction. Computer scientists and other researchers often make their programs freely available online. If this software makes a valuable contribution inside or outside of academia then its creators may want to demonstrate this with a suitable indicator, such as download counts. Methods. Download counts, citation counts, labels and licenses were extracted for programs that were both hosted in the Google Code software repository and cited in Scopus. Analysis. Download counts were correlated with Web of Science citations, the distributions of both were compared and common software labels and licencing arrangements were identified. Results. Although downloads correlate positively and significantly with Scopus citations, the correlation is weak (0.3) because some software has a large natural audience outside of academia. There is disagreement on the best licence to use for shared software, with no licence chosen by more than about a fifth of the projects. The most common language label was Java (20%) and, excluding generic computing terms, the most common topic labels were Google (5%), security (3%) and bioinformatics (3%). Conclusions. Download counts can give evidence of wider nonacademic uses of software. However, software that is apparently not primarily designed for research but that is nevertheless cited by academics can also attract many downloads. Overall, download counts can be used as an indicator of academic value, but only if contextualised with the purpose of the program.

[1]  Jairo Aponte,et al.  How Distributed Version Control Systems impact open source software projects , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[2]  Eric A. von Hippel,et al.  How Open Source Software Works: 'Free' User-to-User Assistance? , 2000 .

[3]  Giancarlo Succi,et al.  Download Patterns and Releases in Open Source Software Projects: A Perfect Symbiosis? , 2010, OSS.

[4]  Mike Thelwall,et al.  Evaluating altmetrics , 2013, Scientometrics.

[5]  Mike Thelwall,et al.  The role of online videos in research communication: A content analysis of YouTube videos cited in academic publications , 2012, J. Assoc. Inf. Sci. Technol..

[6]  Guido Hertel,et al.  Motivation of software developers in Open Source projects: an Internet-based survey of contributors to the Linux kernel , 2003 .

[7]  Reidar Conradi,et al.  Adoption of open source software in software-intensive organizations - A systematic literature review , 2010, Inf. Softw. Technol..

[8]  Mike Thelwall,et al.  Web indicators for research evaluation. Part 2: Social media metrics , 2015 .

[9]  Nicolás Robinson-García,et al.  Towards a Book Publishers Citation Reports. First approach using the Book Citation Index , 2012, Revista española de Documentación Científica.

[10]  Mike Thelwall,et al.  Web indicators for research evaluation. Part 1: Citations and links to academic articles from the Web , 2015 .

[11]  Robert M. Sauer,et al.  Why Develop Open Source Software? The Role of Non-Pecuniary Benefits, Monetary Rewards and Open Source Licence Type , 2007 .

[12]  Shuhua Liu,et al.  Humanities in the twenty-first century: beyond utility and markets , 2015 .

[13]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[14]  Pär Ågerfalk Open source software: new horizons : proceedings , 2010 .

[15]  Paul Groth,et al.  The Altmetrics Collection , 2012, PloS one.

[16]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[17]  Peter Ingwersen,et al.  Indicators for the Data Usage Index (DUI): an incentive for publishing primary biodiversity data through global information infrastructure , 2011, BMC Bioinformatics.

[18]  Christine D. Brown Straddling the humanities and social sciences: The research process of music scholars , 2002 .

[19]  Laura L. Pavlech Data Citation Index. , 2016 .

[20]  Reidar Conradi,et al.  Quality, productivity and economic benefits of software reuse: a review of industrial studies , 2007, Empirical Software Engineering.

[21]  Eleonora Belfiore,et al.  Beyond the “Toolkit Approach”: Arts Impact Evaluation Research and the Realities of Cultural Policy‐Making , 2010 .

[22]  Clément Calenge,et al.  The package “adehabitat” for the R software: A tool for the analysis of space and habitat use by animals , 2006 .

[23]  Luc Vincent,et al.  Google Book Search: Document Understanding on a Massive Scale , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[24]  Brian Fitzgerald,et al.  Understanding open source software development , 2002 .

[25]  William B. Frakes,et al.  Software reuse research: status and future , 2005, IEEE Transactions on Software Engineering.

[26]  Bradley M. Hemminger,et al.  Altmetrics in the wild: Using social media to explore scholarly impact , 2012, ArXiv.

[27]  Cornelia Boldyreff,et al.  Open Source Software: New Horizons - 6th International IFIP WG 2.13 Conference on Open Source Systems, OSS 2010, Notre Dame, IN, USA, May 30 - June 2, 2010. Proceedings , 2010, OSS.

[28]  Kevin Crowston,et al.  Towards a Portfolio of FLOSS Project Success Measures , 2004, ICSE 2004.

[29]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[30]  C. Borgman,et al.  If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology , 2013, PloS one.

[31]  Mike Thelwall,et al.  Assessing the citation impact of books: The role of Google Books, Google Scholar, and Scopus , 2011, J. Assoc. Inf. Sci. Technol..

[32]  Kevin Crowston,et al.  Defining Open Source Software Project Success , 2003, ICIS.

[33]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[34]  Christine L. Borgman,et al.  The conundrum of sharing research data , 2012, J. Assoc. Inf. Sci. Technol..

[35]  Mike Thelwall,et al.  Web indicators for research evaluation. Part 3: books and non standard outputs , 2015 .

[36]  Peter Kraker,et al.  Research Data Explored: Citations versus Altmetrics , 2015, ISSI.

[37]  Joel West,et al.  How open is open enough?: Melding proprietary and open source platform strategies , 2003 .

[38]  Martyn Poliakoff,et al.  The Periodic Table of Videos , 2011, Science.

[39]  Colin Atkinson,et al.  Code Conjurer: Pulling Reusable Software out of Thin Air , 2008, IEEE Software.

[40]  Mike Thelwall,et al.  Distributions for cited articles from individual subjects and years , 2014, J. Informetrics.

[41]  P. Anagnostou,et al.  Research data sharing: Lessons from forensic genetics. , 2013, Forensic science international. Genetics.

[42]  Eleonora Belfiore,et al.  Humanities in the Twenty-First Century: Beyond Utility and Markets , 2013 .

[43]  Matthew A. Johnson,et al.  A Study of Scala Repositories on Github , 2014 .

[44]  Rodrigo Costas,et al.  Users, narcissism and control – tracking the impact of scholarly publications in the 21st century , 2012 .

[45]  Qianqian Wang,et al.  Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers , 2015, J. Informetrics.

[46]  Giovanni Abramo,et al.  The VQR, Italy's second national research assessment: Methodological failures and ranking distortions , 2015, J. Assoc. Inf. Sci. Technol..