Development of Scientific Software: a Systematic Mapping, a bibliometrics Study, and a Paper Repository

Scientific and engineering research is heavily dependent on effective development and use of software artifacts. Many of these artifacts are produced by the scientists themselves, rather than by trained software engineers. To address the challenges in this area, a research community often referred to as "Development of Scientific Software" has emerged in the last few decades. As this research area has matured, there has been a sharp increase in the number of papers and results made available, and it has thus become important to summarize and provide an overview about those studies. Through a systematic mapping and bibliometrics study, we have reviewed 130 papers in this area. We present the results of our study in this paper. Also we have made the mapping data available on an online repository which is planned to be updated on a regular basis. The results of our study seem to suggest that many software engineering techniques and activities are being used in the development of scientific software. However, there is still a need for further exploration of the usefulness of specific software engineering techniques (e.g., regarding software maintenance, evolution, refactoring, re(v)-engineering, process and project management) in the scientific context. It is hoped that this article will help (new) researchers get an overview of the research space and help them to understand the trends in the area.

[1]  Karla Morris,et al.  On the object-oriented design of reference-counted shadow objects , 2011, SECSE '11.

[2]  Yang Li,et al.  (Position Paper) Applying software engineering methods and tools to CSE research projects , 2010, ICCS.

[3]  Diane Kelly,et al.  Five Recommended Practices for Computational Scientists Who Write Software , 2009, Computing in Science & Engineering.

[4]  Marjan Mernik,et al.  Developing scientific applications using Generative Programming , 2009, 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering.

[5]  David Kane,et al.  Introducing agile development into bioinformatics: an experience report , 2003, Proceedings of the Agile Development Conference, 2003. ADC 2003.

[6]  Pearl Brereton,et al.  Using Mapping Studies in Software Engineering , 2008, PPIG.

[7]  David W. Kane,et al.  Agile methods in biomedical software development: a multi-site experience report , 2006, BMC Bioinformatics.

[8]  J.M. Willenbring,et al.  The Trilinos Software Lifecycle Model , 2007, Third International Workshop on Software Engineering for High Performance Computing Applications (SE-HPC '07).

[9]  Michael A. Heroux,et al.  Improving the Development Process for CSE Software , 2007, 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07).

[10]  A Bibliometric Assessment of Canadian Software Engineering Scholars and Institutions (1996-2006) , 2010, Comput. Inf. Sci..

[11]  Forrest Shull,et al.  Generating testable hypotheses from tacit knowledge for high productivity computing , 2005, SE-HPCS '05.

[12]  Yvonne Coady,et al.  Mind the gap!: bridging the dichotomy of design and implementation , 2011, SECSE '11.

[13]  D. M. Beazley,et al.  Feeding a large-scale physics application to Python , 1997 .

[14]  Robert E. Wolfe,et al.  Strategies for enabling software reuse within the Earth Science Community , 2004, IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium.

[15]  Jesse H. Poore,et al.  Modeling Input Space for Testing Scientific Computational Software: A Case Study , 2008, ICCS.

[16]  Premkumar T. Devanbu,et al.  Adaptable Assertion Checking for Scientific Software Components , 2004 .

[17]  Robert Gentleman,et al.  R Programming for Bioinformatics , 2008 .

[18]  Masha Sosonkina,et al.  A component approach to collaborative scientific software development: Tools and techniques utilized by the Quantum Chemistry Science Application Partnership , 2008 .

[19]  Pras Pathmanathan,et al.  Chaste: using agile programming techniques to develop computational biology software , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[20]  R. Neely Practical software quality engineering on a large multi-disciplinary HPC development team , 2004, ICSE 2004.

[21]  Carole L. Palmer,et al.  Comparing bioinformatics software development by computer scientists and biologists: An exploratory study , 2009, 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering.

[22]  Masha Sosonkina,et al.  Integrating Performance Tools with Large-Scale Scientific Software , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[23]  Magdy S. Abadir,et al.  Analyzing multichip module testing strategies , 1994, IEEE Design & Test of Computers.

[24]  Miguel Ángel Gómez-Nieto,et al.  Object-oriented techniques for design and development of standard software solutions in automation and data management in analytical chemistry , 2006 .

[25]  Judith Segal,et al.  Scientists and Software Engineers: A Tale of Two Cultures , 2008, PPIG.

[26]  M. G. Cox,et al.  Design and use of reference data sets for testing scientific software , 1999 .

[27]  I. Gorton,et al.  A High-Performance Event Service for HPC Applications , 2007, Third International Workshop on Software Engineering for High Performance Computing Applications (SE-HPC '07).

[28]  G. D. Mallinson,et al.  The design of a component-oriented framework for numerical simulation software , 2007, Adv. Eng. Softw..

[29]  Judith Segal,et al.  Some challenges facing software engineers developing software for scientists , 2009, 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering.

[30]  Diane Kelly,et al.  Dealing with Risk in Scientific Software Development , 2008, IEEE Software.

[31]  Michael T. Heath,et al.  Roccom: an object-oriented, data-centric software integration framework for multiphysics simulations , 2003, ICS '03.

[32]  Brian Vinter,et al.  Rapid development of scalable scientific software using a process oriented approach , 2011, J. Comput. Sci..

[33]  R. P. Kendall,et al.  Case study of the Falcon code project , 2005, SE-HPCS '05.

[34]  Kai Petersen,et al.  Systematic Mapping Studies in Software Engineering , 2008, EASE.

[35]  Osni Marques,et al.  Building a software infrastructure for computational science applications: lessons and solutions , 2005, SE-HPCS '05.

[36]  Jarek Nieplocha,et al.  Component‐based integration of chemistry and optimization software , 2004, Journal of computational chemistry.

[37]  Jeffrey Overbey,et al.  Refactorings for Fortran and high-performance computing , 2005, SE-HPCS '05.

[38]  Victor R. Basili,et al.  The ASC-Alliance Projects: A Case Study of Large-Scale Parallel Scientific Code Development , 2008, Computer.

[39]  David E. Bernholdt,et al.  Managing Complexity in Modern High End Scientific Computing through Component-Based Software Engineering , 2004 .

[40]  Karen Schuchardt,et al.  Velo: riding the knowledge management wave for simulation and modeling , 2011, SECSE '11.

[41]  Travis E. Oliphant,et al.  Python for Scientific Computing , 2007, Computing in Science & Engineering.

[42]  James F. Cremer,et al.  Creating scientific software , 1997 .

[43]  Robert L. Glass,et al.  An assessment of systems and software engineering scholars and institutions (1999-2003) , 2005, J. Syst. Softw..

[44]  V. Basili Software modeling and measurement: the Goal/Question/Metric paradigm , 1992 .

[45]  Jacquelyn S. Fetrow,et al.  Scientific Software Development Is Not an Oxymoron , 2006, PLoS Comput. Biol..

[46]  Marzio Sala,et al.  An object-oriented framework for the development of scalable parallel multilevel preconditioners , 2006, TOMS.

[47]  Barbara Paech,et al.  Supporting the testing of scientific frameworks with software product line engineering: a proposed approach , 2011, SECSE '11.

[48]  Les Hatton,et al.  The T-experiments: errors in scientific software , 1996, Quality of Numerical Software.

[49]  T. Cook,et al.  Quasi-experimentation: Design & analysis issues for field settings , 1979 .

[50]  N. D. Bellis Bibliometrics and Citation Analysis: From the Science Citation Index to Cybermetrics , 2009 .

[51]  Matthias Wagner Evolution from a Scientific Application to an Applicable Product , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[52]  Michael L. Van de Vanter,et al.  Scientific Computing's Productivity Gridlock: How Software Engineering Can Help , 2009, Computing in Science & Engineering.

[53]  Nenad Medvidovic,et al.  Scientific Software as Workflows: From Discovery to Distribution , 2008, IEEE Software.

[54]  Diane Kelly,et al.  Examining random and designed tests to detect code mistakes in scientific software , 2011, J. Comput. Sci..

[55]  Roscoe A. Bartlett,et al.  Integration strategies for Computational Science & Engineering software , 2009, 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering.

[56]  Sabine Rathmayer,et al.  SEMPA: software engineering for parallel scientific computing , 1997, IEEE Concurrency.

[57]  Lutz Gross,et al.  A New Design of Scientific Software Using Python and XML , 2008 .

[58]  Julian Cummings,et al.  Comparison of C++ and Fortran 90 for object-oriented scientific programming , 1997 .

[59]  James Arthur Kohl,et al.  Component-based software for high-performance scientific computing , 2005 .

[60]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[61]  D. M. Woollard,et al.  Software engineering for neural dynamics simulations: a case study , 2004, ICSE 2004.

[62]  Douglass E. Post,et al.  Case Study of the Nene Code Project , 2010, Computing in Science & Engineering.

[63]  Robert R. Downs,et al.  Relevance of software reuse in building advanced scientific data processing systems , 2010, Earth Sci. Informatics.

[64]  Hans Petter Langtangen,et al.  Object-oriented design of preconditioned iterative methods in diffpack , 1997, TOMS.

[65]  Frank Elberzhager,et al.  A systematic mapping study on the combination of static and dynamic quality assurance techniques , 2012, Inf. Softw. Technol..

[66]  Charles Blilie,et al.  Patterns in scientific software: an introduction , 2002, Comput. Sci. Eng..

[67]  Boyana Norris,et al.  Managing scientific software complexity with Bocca and CCA , 2008 .

[68]  Lawrence G. Votta,et al.  Can software engineering solve the HPCS problem? , 2005, SE-HPCS '05.

[69]  Austen Rainer,et al.  Case Study Research in Software Engineering - Guidelines and Examples , 2012 .

[70]  Yin Liu,et al.  Static analysis for inference of explicit information flow , 2008, PASTE '08.

[71]  Diane Kelly A Software Chasm: Software Engineering and Scientific Computing , 2007, IEEE Software.

[72]  S. S. Sarangdevot,et al.  Investigating the application of AOP methodology in development of bioinformatics software using Eclipse-AJDT environment , 2011, ICWET.

[73]  Jeffrey C. Carver,et al.  Development of a Weather Forecasting Code: A Case Study , 2008, IEEE Software.

[74]  Michael A. Heroux,et al.  On the design of interfaces to sparse direct solvers , 2008, TOMS.

[75]  Diane Kelly Determining factors that affect long-term evolution in scientific application software , 2009, J. Syst. Softw..

[76]  Jeffrey C. Carver,et al.  Software Development Environments for Scientific and Engineering Software: A Series of Case Studies , 2007, 29th International Conference on Software Engineering (ICSE'07).

[77]  Scott R. Kohn,et al.  Component Technology for High-Performance Scientific Simulation Software , 2000, The Architecture of Scientific Software.

[78]  Judith Segal Software Development Cultures and Cooperation Problems: A Field Study of the Early Stages of Development of Software for a Scientific Community , 2009, Computer Supported Cooperative Work (CSCW).

[79]  Douglass E. Post,et al.  Software Project Management and Quality Engineering Practices for Complex, Coupled Multiphysics, Massively Parallel Computational Simulations: Lessons Learned From ASCI , 2004, Int. J. High Perform. Comput. Appl..

[80]  Diane Kelly,et al.  Testing for trustworthiness in scientific software , 2009, 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering.

[81]  Michael A. Heroux,et al.  Improving CSE software through reproducibility requirements , 2011, SECSE '11.

[82]  Judith Segal,et al.  When Software Engineers Met Research Scientists: A Case Study , 2005, Empirical Software Engineering.

[83]  Wen Yu,et al.  Reusability of FEA software: A program family approach , 2009, 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering.

[84]  Jeffrey C. Carver,et al.  Understanding the High-Performance-Computing Community: A Software Engineer's Perspective , 2008, IEEE Software.

[85]  T. N. Bhat,et al.  A framework for scientific data modeling and automated software development , 2005, Bioinform..

[86]  Russ Miller,et al.  The Design of a Portable Scientific Tool: A Case Study Using SnB , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[87]  S. Thorsteinson,et al.  Scientific Software Testing: Analysis with Four Dimensions , 2011, IEEE Software.

[88]  Robert L. Glass An assessment of systems and software engineering scholars and institutions (1994-1998) , 1999, J. Syst. Softw..

[89]  Peter Gregor,et al.  Usability and User-Centered Design in Scientific Software Development , 2009, IEEE Software.

[90]  Judith Segal Some Problems of Professional End User Developers , 2007 .

[91]  Steve M. Easterbrook,et al.  Engineering the Software for Understanding Climate Change , 2009, Computing in Science & Engineering.

[92]  Valerie Maxville Preparing scientists for scalable software development , 2009, 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering.

[93]  Geoff R. Mant,et al.  Scientific Software Development at a Research Facility , 2008, IEEE Software.

[94]  D. E. Post The Challenge for computational science , 2004, ICSE 2004.

[95]  Janice Singer,et al.  How do scientists develop and use scientific software? , 2009, 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering.

[96]  Robert L. Glass,et al.  An assessment of systems and software engineering scholars and institutions (1998-2002) , 2003, J. Syst. Softw..

[97]  Yutaka Kawai,et al.  SAGA-based user environment for distributed computing resources: A universal Grid solution over multi-middleware infrastructures , 2010, ICCS.

[98]  Yang Li Reengineering a scientific software and lessons learned , 2011, SECSE '11.

[99]  W. Yu,et al.  A document driven methodology for developing a high quality Parallel Mesh Generation Toolbox , 2009, Adv. Eng. Softw..

[100]  Premkumar T. Devanbu,et al.  Improving scientific software component quality through assertions , 2005, SE-HPCS '05.

[101]  Dietmar Pfahl,et al.  A literature review of agile practices and their effects in scientific software development , 2011, SECSE '11.

[102]  Viktor K. Decyk,et al.  Why Fortran? , 2007, Computing in Science & Engineering.

[103]  Diane Kelly,et al.  Mutation Sensitivity Testing , 2009, Computing in Science & Engineering.

[104]  Vahid Garousi,et al.  A Bibliometrics Analysis of Canadian Electrical and Computer Engineering Institutions (1996-2006) Based on IEEE Journal Publications , 2012, Comput. Inf. Sci..

[105]  Helgi Adalsteinsson,et al.  Design patterns for multiphysics modeling in Fortran 2003 and C++ , 2010, TOMS.

[106]  Martin Erwig,et al.  Software reuse for scientific computing through program generation , 2005, TSEM.

[107]  Vahid Garousi,et al.  Classification and trend analysis of UML books (1997–2009) , 2011, Software & Systems Modeling.

[108]  Shayne Flint,et al.  A survey of scientific software development , 2010, ESEM '10.

[109]  William L. Kleb,et al.  Exploring XP for Scientific Research , 2003, IEEE Softw..

[110]  Lionel C. Briand,et al.  A Systematic Review of the Application and Empirical Investigation of Search-Based Test Case Generation , 2010, IEEE Transactions on Software Engineering.

[111]  Nenad Medvidovic,et al.  Injecting software architectural constraints into legacy scientific applications , 2009, 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering.

[112]  Andrea Lani,et al.  The COOLFluiD Framework: Design Solutions for High Performance Object Oriented Scientific Computing Software , 2005, International Conference on Computational Science.

[113]  Dale R. Shires,et al.  Coupling Scientific Applications within an Object-Oriented Programming Framework: Improving Development Time and Software Quality , 2004, PDPTA.

[114]  Robert Baxter,et al.  Software engineering is software engineering , 2004, ICSE 2004.

[115]  Judith Segal,et al.  Some Challenges Facing Scientific Software Developers: The Case of Molecular Biology , 2009, 2009 Fifth IEEE International Conference on e-Science.

[116]  Aldo Dall'Osso,et al.  Using computer algebra systems in the development of scientific computer codes , 2003, Future Gener. Comput. Syst..

[117]  A. Krishnamurthy,et al.  Developing a Computational Science IDE for HPC Systems , 2007, Third International Workshop on Software Engineering for High Performance Computing Applications (SE-HPC '07).