Understanding the scientific software ecosystem and its impact: Current and future measures

Software is increasingly important to the scientific enterprise, and science-funding agencies are increasingly funding software work. Accordingly, many different participants need insight into how to understand the relationship between software, its development, its use, and its scientific impact. In this article, we draw on interviews and participant observation to describe the information needs of domain scientists, software component producers, infrastructure providers, and ecosystem stewards, including science funders. We provide a framework by which to categorize different types of measures and their relationships as they reach around from funding, development, scientific use, and through to scientific impact. We use this framework to organize a presentation of existing measures and techniques, and to identify areas in which techniques are either not widespread, or are entirely missing. We conclude with policy recommendations designed to improve insight into the scientific software ecosystem, make it more understandable, and thereby contribute to the progress of science.

[1]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[2]  R. W. Burns When Old Technologies Were New , 1989 .

[3]  C. Marvin,et al.  When Old Technologies Were New , 2010 .

[4]  K. Weick Theory Construction as Disciplined Imagination , 1989 .

[5]  Robert A. van de Geijn,et al.  Using PLAPACK - parallel linear algebra package , 1997 .

[6]  Daniel Atkins,et al.  Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure , 2003 .

[7]  Yolanda Gil,et al.  Pegasus and the Pulsar Search: From Metadata to Execution on the Grid , 2003, PPAM.

[8]  Charles G. Renfro A compendium of existing econometric software packages , 2004 .

[9]  Susan Leigh Star,et al.  Layers of Silence, Arenas of Voice: The Ecology of Visible and Invisible Work , 1999, Computer Supported Cooperative Work (CSCW).

[10]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[11]  Douglas Thain,et al.  How to measure a large open‐source distributed system , 2006, Concurr. Comput. Pract. Exp..

[12]  Teresa D. Harrison,et al.  Lessons from the JMCB Archive , 2006 .

[13]  Junwei Cao,et al.  A Case Study on the Use of Workflow Technologies for Scientific Analysis: Gravitational Wave Data Analysis , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[14]  E. Tasker,et al.  A test suite for quantitative comparison of hydrodynamic codes in astrophysics , 2008, 0808.1844.

[15]  A. Gawer,et al.  How Companies Become Platform Leaders , 2008 .

[16]  Nancy Wilkins-Diehr,et al.  TeraGrid Science Gateways and Their Impact on Science , 2008, Computer.

[17]  Jeffrey C. Carver First International Workshop on Software Engineering for Computational Science & Engineering , 2009, Computing in Science & Engineering.

[18]  Kevin Crowston,et al.  Heartbeat: Measuring Active User Base and Potential User Interest in FLOSS Projects , 2009, OSS.

[19]  Victoria Stodden,et al.  Reproducible Research , 2019, The New Statistics with R.

[20]  Aslak Tveito,et al.  About Scientific Software , 2010 .

[21]  Michael McLennan,et al.  HUBzero: A Platform for Dissemination and Collaboration in Computational Science and Engineering , 2010, Computing in Science & Engineering.

[22]  Sean Bechhofer,et al.  Towards open science: the myExperiment approach , 2010 .

[23]  Charlotte P. Lee,et al.  Synergizing in Cyberinfrastructure Development , 2010, Computer Supported Cooperative Work (CSCW).

[24]  Carole A. Goble,et al.  Towards open science: the myExperiment approach , 2010, Concurr. Comput. Pract. Exp..

[25]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[26]  James D. Herbsleb,et al.  Scientific software production: incentives and collaboration , 2011, CSCW.

[27]  J. Rigby Systematic grant and funding body acknowledgement data for publications: new dimensions and new controversies for research policy and evaluation , 2011 .

[28]  Soonwook Hwang,et al.  Gustav: CPU accounting for small-sized grid infrastructures , 2012, Int. J. Grid Util. Comput..

[29]  Bilel Hadri,et al.  Software Usage on Cray Systems across Three Centers (NICS, ORNL and CSCS) , 2012 .

[30]  Gregor von Laszewski,et al.  Performance metrics and auditing framework using application kernels for high‐performance computer systems , 2013, Concurr. Comput. Pract. Exp..

[31]  The power of altmetrics on a CV , 2013 .

[32]  James D. Herbsleb,et al.  Incentives and integration in scientific software production , 2013, CSCW.

[33]  Noshir S. Contractor Moneyball for nanoHUB: Theory-Driven and Data-Driven Approaches to Understand the Formation and Success of Software Development Teams , 2013, BPM.

[34]  Gerhard Klimeck,et al.  Learning and research in the cloud. , 2013, Nature nanotechnology.

[35]  Juan D. Rogers Introducing the Special Section Theme: Recent Developments in Data Sources and Analysis for R&D Evaluation , 2013 .

[36]  Heather A. Piwowar,et al.  Altmetrics: Value all research products , 2013, Nature.

[37]  Heather A. Piwowar,et al.  The power of altmetrics on a CV , 2013 .

[38]  Piotr Sliz,et al.  Collaboration gets the most out of software , 2013, eLife.

[39]  Daniel S. Katz Transitive Credit as a Means to Address Social and Technological Concerns Stemming from Citation and Attribution of Digital Products , 2014 .

[40]  Kevin Crowston,et al.  C OLLABORATION T HROUGH O PEN S UPERPOSITION : A T HEORY OF THE O PEN S OURCE W AY 1 , 2016 .

[41]  R. Scoble,et al.  Assessment, evaluations, and definitions of research impact: A review , 2014 .

[42]  Daniel S. Katz,et al.  Implementing Transitive Credit with JSON-LD , 2014, ArXiv.

[43]  Michael A. Smith,et al.  Conclusion and Policy Recommendations , 2015 .

[44]  James Howison,et al.  Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature , 2016, J. Assoc. Inf. Sci. Technol..

[45]  The Impact of Commercial Organizations on Volunteer , 2022 .