Integration of the Rosetta suite with the python software stack via reproducible packaging and core programming interfaces for distributed simulation

The Rosetta software suite for macromolecular modeling is a powerful computational toolbox for protein design, structure prediction, and protein structure analysis. The development of novel Rosetta‐based scientific tools requires two orthogonal skill sets: deep domain‐specific expertise in protein biochemistry and technical expertise in development, deployment, and analysis of molecular simulations. Furthermore, the computational demands of molecular simulation necessitate large scale cluster‐based or distributed solutions for nearly all scientifically relevant tasks. To reduce the technical barriers to entry for new development, we integrated Rosetta with modern, widely adopted computational infrastructure. This allows simplified deployment in large‐scale cluster and cloud computing environments, and effective reuse of common libraries for simulation execution and data analysis. To achieve this, we integrated Rosetta with the Conda package manager; this simplifies installation into existing computational environments and packaging as docker images for cloud deployment. Then, we developed programming interfaces to integrate Rosetta with the PyData stack for analysis and distributed computing, including the popular tools Jupyter, Pandas, and Dask. We demonstrate the utility of these components by generating a library of a thousand de novo disulfide‐rich miniproteins in a hybrid simulation that included cluster‐based design and interactive notebook‐based analyses. Our new tools enable users, who would otherwise not have access to the necessary computational infrastructure, to perform state‐of‐the‐art molecular simulation and design with Rosetta.

[1]  Nicholas B Rego,et al.  3Dmol.js: molecular visualization with WebGL , 2014, Bioinform..

[2]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[3]  David Baker,et al.  Automating human intuition for protein design , 2014, Proteins.

[4]  D. Baker,et al.  Global analysis of protein folding using massively parallel design, synthesis, and testing , 2017, Science.

[5]  Jens Meiler,et al.  Protocols for Molecular Modeling with Rosetta3 and RosettaScripts , 2016, Biochemistry.

[6]  David Baker,et al.  Cytosolic expression, solution structures, and molecular dynamics simulation of genetically encodable disulfide‐rich de novo designed peptides , 2018, Protein science : a publication of the Protein Society.

[7]  Jeffrey M. Perkel,et al.  Why Jupyter is data scientists’ computational notebook of choice , 2018, Nature.

[8]  Jens Meiler,et al.  RosettaScripts: A Scripting Language Interface to the Rosetta Macromolecular Modeling Suite , 2011, PloS one.

[9]  M. Sternberg,et al.  Analysis and classification of disulphide connectivity in proteins. The entropic effect of cross-linkage. , 1994, Journal of molecular biology.

[10]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[11]  D. Baker,et al.  Relaxation of backbone bond geometry improves protein energy landscape modeling , 2014, Protein science : a publication of the Protein Society.

[12]  Thomas Hauser,et al.  The scaling of many-task computing approaches in python on cluster supercomputers , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[13]  Alexander S. Rose,et al.  NGLview–interactive molecular graphics for Jupyter notebooks , 2018, Bioinform..

[14]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[15]  Jens Meiler,et al.  Web‐accessible molecular modeling with Rosetta: The Rosetta Online Server that Includes Everyone (ROSIE) , 2018, Protein science : a publication of the Protein Society.

[16]  Sergey Lyskov,et al.  PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta , 2010, Bioinform..

[17]  David Baker,et al.  Accurate de novo design of hyperstable constrained peptides , 2016, Nature.