Code Convention Adherence in Research Data Infrastructure Software: An Exploratory Study

Science is rapidly evolving, incorporating technology like autonomous vehicles, high-throughput scientific instruments, high-fidelity numerical models, and sensor networks, all generating data with increasing frequency, variety, and volume. Scientists committed to open science are interested in sharing this data, which requires research data infrastructure (RDI). The software underlying RDI is often created and/or deployed by people who have not received formal training in software engineering, or at organizations with primary mandates that do not include software development. Our understanding of software engineering as a field and practice does not universally translate to this software. As RDI software is pushed to handle larger data sets, and used to share data more widely, it is important to understand the maintainability, the resilience of the development community, and other indicators of long-term software project health. While there is a body of research on scientific software, and on free and open source software, it is not known if existing approaches to assessing these properties are effective for RDI software. In this exploratory study, we calculate one proxy measure for maintainability (code convention adherence) for a popular ocean data management system, and compare the results with four open source projects, and with the apparent experience of users as captured in public mailing lists and an issue tracker. The results advance our limited understanding of this type of software, and inform hypothesis generation and future research design.

[1]  Carolyn B. Seaman,et al.  Measuring and Monitoring Technical Debt , 2011, Adv. Comput..

[2]  Premkumar T. Devanbu,et al.  A simpler model of software readability , 2011, MSR '11.

[3]  Paul W. Oman,et al.  Using metrics to evaluate software system maintainability , 1994, Computer.

[4]  Margaret Hedstrom,et al.  The application of archival concepts to a data-intensive environment: working with scientists to understand data management and preservation needs , 2011 .

[5]  Jeffrey C. Carver,et al.  Claims about the use of software engineering practices in science: A systematic literature review , 2015, Inf. Softw. Technol..

[6]  Nenad Medvidovic,et al.  A software architecture-based framework for highly distributed and data intensive scientific applications , 2006, ICSE.

[7]  Eleni Stroulia,et al.  Maintainability and Source Code Conventions: An Analysis of Open Source Projects , 2011 .

[8]  Jeffrey C. Carver,et al.  Understanding the High-Performance-Computing Community: A Software Engineer's Perspective , 2008, IEEE Software.

[9]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[10]  Xiaosong Li,et al.  Effectively teaching coding standards in programming , 2005, SIGITE '05.

[11]  Charles M. Schweik Sustainability in Open Source Software Commons: Lessons Learned from an Empirical Study of SourceForge Projects , 2013 .

[12]  Jungpil Hahn,et al.  The Effects of Programming Style on Open Source Collaboration , 2017, ICIS.

[13]  Benjamin J. Birkinbine Conflict in the Commons: Towards a Political Economy of Corporate Involvement in Free and Open Source Software , 2015 .

[14]  Tim Storer,et al.  Bridging the Chasm , 2017, ACM Comput. Surv..

[15]  G. Avelino An empirical investigation of the abandonment and survival of open source projects , 2019 .

[16]  Steve M. Easterbrook,et al.  Engineering the Software for Understanding Climate Change , 2009, Computing in Science & Engineering.

[17]  M A Branch,et al.  Software maintenance management , 1986 .

[18]  Jeffrey C. Carver,et al.  Development of a Weather Forecasting Code: A Case Study , 2008, IEEE Software.

[19]  Jeffrey C. Carver,et al.  Software Development Environments for Scientific and Engineering Software: A Series of Case Studies , 2007, 29th International Conference on Software Engineering (ICSE'07).

[20]  Paul Dourish,et al.  The value of data: considering the context of production in data economies , 2011, CSCW.

[21]  Curtis R. Cook,et al.  A taxonomy for programming style , 1990, CSC '90.

[22]  Eleni Stroulia,et al.  Code convention adherence in evolving software , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[23]  Greg Wilson,et al.  Configuration Management for Large-Scale Scientific Computing at the UK Met Office , 2008, Computing in Science & Engineering.

[24]  Cristina D. S. Tollefsen,et al.  The Development of a Canadian Integrated Ocean Observing System (CIOOS) , 2019, Front. Mar. Sci..

[25]  Forrest Shull Assuring the Future? A Look at Validating Climate Model Software , 2011, IEEE Softw..

[26]  Utpal M. Dholakia,et al.  Open Source Software User Communities: A Study of Participation in Linux User Groups , 2006, Manag. Sci..

[27]  Jordi Cabot,et al.  Assessing the bus factor of Git repositories , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[28]  Greg Miller,et al.  A Scientist's Nightmare: Software Problem Leads to Five Retractions , 2006, Science.

[29]  Karen S. Baker,et al.  Strategies Supporting Heterogeneous Data and Interdisciplinary Collaboration: Towards an Ocean Informatics Environment , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[30]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[31]  Diane Kelly,et al.  Dealing with Risk in Scientific Software Development , 2008, IEEE Software.

[32]  Marco Aurélio Gerosa,et al.  More Common Than You Think: An In-depth Study of Casual Contributors , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[33]  James M. Bieman,et al.  Testing scientific software: A systematic literature review , 2014, Inf. Softw. Technol..

[34]  Edward Vanden Berghe,et al.  'Ocean biodiversity informatics': a new era in marine biology research and management , 2006 .

[35]  Filippo Ricca,et al.  Are Heroes common in FLOSS projects? , 2010, ESEM '10.

[36]  Alex H. Poole,et al.  How has your science data grown? Digital curation and the human factor: a critical literature review , 2015 .

[37]  Michael L. Van de Vanter,et al.  Scientific Computing's Productivity Gridlock: How Software Engineering Can Help , 2009, Computing in Science & Engineering.

[38]  Judith Segal,et al.  Scientists and Software Engineers: A Tale of Two Cultures , 2008, PPIG.

[39]  Yasmeen Shorish,et al.  Data Information Literacy and Undergraduates: A Critical Competency , 2015 .

[40]  Norman F. Schneidewind,et al.  The State of Software Maintenance , 1987, IEEE Transactions on Software Engineering.