Better together: Elements of successful scientific software development in a distributed collaborative community

Many scientific disciplines rely on computational methods for data analysis, model generation, and prediction. Implementing these methods is often accomplished by researchers with domain expertise but without formal training in software engineering or computer science. This arrangement has led to underappreciation of sustainability and maintainability of scientific software tools developed in academic environments. Some software tools have avoided this fate, including the scientific library Rosetta. We use this software and its community as a case study to show how modern software development can be accomplished successfully, irrespective of subject area. Rosetta is one of the largest software suites for macromolecular modeling, with 3.1 million lines of code and many state-of-the-art applications. Since the mid 1990s, the software has been developed collaboratively by the RosettaCommons, a community of academics from over 60 institutions worldwide with diverse backgrounds including chemistry, biology, physiology, physics, engineering, mathematics, and computer science. Developing this software suite has provided us with more than two decades of experience in how to effectively develop advanced scientific software in a global community with hundreds of contributors. Here we illustrate the functioning of this development community by addressing technical aspects (like version control, testing, and maintenance), community-building strategies, diversity efforts, software dissemination, and user support. We demonstrate how modern computational research can thrive in a distributed collaborative community. The practices described here are independent of subject area and can be readily adopted by other software development communities.

[1]  David Baker,et al.  Algorithm discovery by protein folding game players , 2011, Proceedings of the National Academy of Sciences.

[2]  Victoria Stodden,et al.  Reproducibility of research: Issues and proposed remedies , 2018, Proceedings of the National Academy of Sciences.

[3]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[4]  Anna Nowogrodzki,et al.  How to support open-source software and stay sane , 2019, Nature.

[5]  Roland Marquet,et al.  8-Modified-2′-Deoxyadenosine Analogues Induce Delayed Polymerization Arrest during HIV-1 Reverse Transcription , 2011, PloS one.

[6]  Oliver F. Lange,et al.  Structure prediction for CASP8 with all‐atom refinement using Rosetta , 2009, Proteins.

[7]  Brian D. Weitzner,et al.  Serverification of Molecular Modeling Applications: The Rosetta Online Server That Includes Everyone (ROSIE) , 2013, PloS one.

[8]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.

[9]  Christopher Bystroff,et al.  InteractiveROSETTA: a graphical user interface for the PyRosetta protein modeling suite , 2015, Bioinform..

[10]  David E. Kim,et al.  Computational Alanine Scanning of Protein-Protein Interfaces , 2004, Science's STKE.

[11]  Andrew Leaver-Fay,et al.  A cyber-linked undergraduate research experience in computational biomolecular structure prediction and design , 2017, PLoS Comput. Biol..

[12]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[13]  Sergey Lyskov,et al.  PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta , 2010, Bioinform..

[14]  David Baker,et al.  Foldit Standalone: a video game-derived protein structure manipulation interface using Rosetta , 2017, Bioinform..

[15]  Seth Cooper,et al.  Creating custom Foldit puzzles for teaching biochemistry , 2019, Biochemistry and molecular biology education : a bimonthly publication of the International Union of Biochemistry and Molecular Biology.

[16]  David Baker,et al.  High-resolution comparative modeling with RosettaCM. , 2013, Structure.

[17]  Daniel W. Kulp,et al.  Generalized Fragment Picking in Rosetta: Design, Protocols and Applications , 2011, PloS one.

[18]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[19]  Lars Malmström,et al.  Structure prediction for CASP7 targets using extensive all‐atom refinement with Rosetta@home , 2007, Proteins.

[20]  Jens Meiler,et al.  RosettaScripts: A Scripting Language Interface to the Rosetta Macromolecular Modeling Suite , 2011, PloS one.

[21]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[22]  Nancy J. Cooke,et al.  Enhancing the Effectiveness of Team Science , 2015 .

[23]  David Baker,et al.  Assessment of the optimization of affinity and specificity at protein–DNA interfaces , 2009, Nucleic acids research.

[24]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[25]  Hamed Heydari,et al.  MabsBase: A Mycobacterium abscessus Genome and Annotation Database , 2013, PloS one.

[26]  Brian D. Weitzner,et al.  Macromolecular modeling and design in Rosetta: recent methods and frameworks , 2020, Nature Methods.

[27]  Jens Meiler,et al.  Protocols for Molecular Modeling with Rosetta3 and RosettaScripts , 2016, Biochemistry.

[28]  Lars Malmström,et al.  The Proteome Folding Project: proteome-scale prediction of structure and function. , 2011, Genome research.

[29]  Yolanda Gil,et al.  Enhancing reproducibility for computational methods , 2016, Science.

[30]  Nikolaos G Sgourakis,et al.  Chemical shift-based methods in NMR structure determination. , 2018, Progress in nuclear magnetic resonance spectroscopy.

[31]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[32]  Travis J. Wiltshire,et al.  Problem-Solving Phase Transitions During Team Collaboration , 2018, Cogn. Sci..

[33]  Janice Singer,et al.  How do scientists develop and use scientific software? , 2009, 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering.

[34]  Denise Sekaquaptewa,et al.  When being different is detrimental: Solo status and the performance of women and racial minorities , 2002 .

[35]  Richard Bonneau,et al.  The 2010 Rosetta Developers Meeting: Macromolecular Prediction and Design Meets Reproducible Publishing , 2011, PloS one.

[36]  Rommie E Amaro A Reflection on Klaus Schulten. , 2017, Journal of chemical theory and computation.

[37]  Roland L. Dunbrack,et al.  The Rosetta all-atom energy function for macromolecular modeling and design , 2017, bioRxiv.

[38]  David Baker,et al.  Comprehensive computational design of ordered peptide macrocycles , 2017, Science.