Towards OSGeo Best Practices for Scientific Software Citation: Integration Options for Persistent Identifiers fn OSGeo Project Repositories

As a contribution to the currently ongoing larger effort to establish Open Science as best practices in academia, this article focuses on the Open Source and Open Access tiers of the Open Science triad and community software projects. The current situation of research software development and the need to recognize it as a significant contribution to science is introduced in relation to Open Science. The adoption of the Open Science paradigms occurs at different speeds and on different levels within the various fields of science and crosscutting software communities. This is paralleled by the emerging of an underlying futuresafe technical infrastructure based on open standards to enable proper recognition for published articles, data, and software. Currently the number of journal publications about research software remains low in comparison to the amount of research code published on various software repositories in the WWW. Because common standards for the citation of software projects (containers) and versions of software are lacking, the FORCE11 group and the CodeMeta project are recommending to establish Persistent Identifiers (PIDs), together with suitable metadata sets to reliably cite research software. This approach is compared to the best practices implemented by the OSGeo Foundation for geospatial community software projects. For GRASS GIS, a OSGeo project and one of the oldest geospatial open source community projects, the external requirements for Digital Object Identifier (DOI)-based software citation are compared with the projects software documentation standards. Based on this status assessment, application scenarios are derived on how OSGeo projects can approach DOI-based software citation, both as a standalone option and also as a means to foster open access journal publications as part of reproducible Open Science. ∗Corresponding author Email address: ploewe@diw.de (Peter Lowe) Submitted to FOSS4G 2017 Conference Proceedings, Boston, USA. September 20, 2017 FOSS4G 2017 Academic Program Towards OSGeo Best Practices 1. Research Code, The Workhorse of Science Software has become the cross-disciplinary workhorse (Brett et al. 2017). Software developed for research is usually created by scientists themselves, instead of through hired professional software developers. According to a survey among 24 universities presented in (BreBrett et al. 2017), scientific software was used by 92% of the researchers, 67% stating that this is fundamental to their research, and 50% developing their own software. This is a potential bottleneck or opportunity for improvement for science, as the coding skills of the researchers directly affect the quality of the scientific results. Effectively, research is becoming dependent upon advances in software, which consists of a diverse range of types of research tools, including operating systems, applications, models, algorithms, middleware and code libraries (Katz 2017). Scientists should be provided with sufficient motivation to hone their programming skills, publish their research software as part of the scientific process, and care about its long term provision to science infrastructure. This motivation should be provided in a suitable coin of the realm of academia, namely recognition for impact and reuse by scientific credit, i.e. citation. This must in turn be footed in reliable social and technical infrastructures. Otherwise, scientific code is in most cases only advanced to the readiness level of a demonstrator/pilot (works for me), as the scope of research projects does not include a drive for code longevity, including refactoring or post-project maintenance. Scientists should know to find all relevant software packages for his or her field of work, either as a starting point for their individual work, or as a long term repository for the legacy of their resulting research code. This is a challenge, as software projects and code repositories (e.g: GitHub, Sourceforge or Bitbucket) on the web follow lifecycles: They appear, rise in popularity and eventually fall out of use when the user base moves on to next generation software, starting the cycle once again. A pragmatic solution would be to put reasonable trust in software projects, which have been used over a long period in time, having stable userand developer-communities to ensure updates of codebase and features, reliable and open communication channels and governance models. Creating a personal survey of such state of the art software projects would result in a significant effort for the individual scientist, yet such surveys for geospatial tools have only been infrequently published in journals, like Steiniger and Hunter 2013. A viable alternative can be provided by self-organized federations of software projects, such as the OSGeo umbrella organization which act as impartial arbiters and provide comparable assessments of software alternatives.