ssbio: A Python Framework for Structural Systems Biology

Summary: Working with protein structures at the genome‐scale has been challenging in a variety of ways. Here, we present ssbio, a Python package that provides a framework to easily work with structural information in the context of genome‐scale network reconstructions, which can contain thousands of individual proteins. The ssbio package provides an automated pipeline to construct high quality genome‐scale models with protein structures (GEM‐PROs), wrappers to popular third‐party programs to compute associated protein properties, and methods to visualize and annotate structures directly in Jupyter notebooks, thus lowering the barrier of linking 3D structural data with established systems workflows. Availability and implementation: ssbio is implemented in Python and available to download under the MIT license at http://github.com/SBRG/ssbio. Documentation and Jupyter notebook tutorials are available at http://ssbio.readthedocs.io/en/latest/. Interactive notebooks can be launched using Binder at https://mybinder.org/v2/gh/SBRG/ssbio/master?filepath=Binder.ipynb. Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  M. Sanner,et al.  Reduced surface: an efficient way to compute molecular surfaces. , 1996, Biopolymers.

[2]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[3]  Fabian A. Buske,et al.  Aquaria: simplifying discovery and insight from protein structures , 2015, Nature Methods.

[4]  Edward J. O'Brien,et al.  Using Genome-scale Models to Predict Biological Capabilities , 2015, Cell.

[5]  M. Oobatake,et al.  Hydration and heat stability effects on protein unfolding. , 1991, Progress in biophysics and molecular biology.

[6]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[7]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[8]  Joshua A. Lerman,et al.  Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments , 2013, Proceedings of the National Academy of Sciences.

[9]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[10]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[11]  H. Kitano,et al.  Software for systems biology: from tools to integrated platforms , 2011, Nature Reviews Genetics.

[12]  C. Kiel,et al.  Structures in systems biology. , 2007, Current opinion in structural biology.

[13]  M. Michael Gromiha,et al.  FOLD-RATE: prediction of protein folding rates from amino acid sequence , 2006, Nucleic Acids Res..

[14]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[15]  K. Dill,et al.  Physical limits of cells and proteomes , 2011, Proceedings of the National Academy of Sciences.

[16]  Edward J. O'Brien,et al.  Reconstruction and modeling protein translocation and compartmentalization in Escherichia coli at the genome-scale , 2014, BMC Systems Biology.

[17]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[18]  Roger L. Chang,et al.  Structural Systems Biology Evaluation of Metabolic Thermotolerance in Escherichia coli , 2013, Science.

[19]  Zhen Zhang,et al.  Systems biology of the structural proteome , 2016, BMC Systems Biology.

[20]  Hyeon Joo,et al.  OPM database and PPM web server: resources for positioning of proteins in membranes , 2011, Nucleic Acids Res..

[21]  Elizabeth Brunk,et al.  A Multi-scale Computational Platform to Mechanistically Assess the Effect of Genetic Variation on Drug Responses in Human Erythrocyte Metabolism , 2016, PLoS Comput. Biol..

[22]  Simon Mitternacht,et al.  FreeSASA: An open source C library for solvent accessible surface area calculations , 2016, F1000Research.

[23]  Julio Saez-Rodriguez,et al.  BioServices: a common Python package to access biological Web Services programmatically , 2013, Bioinform..

[24]  Jared T. Broddrick,et al.  Unique attributes of cyanobacterial metabolism revealed by improved genome-scale metabolic modeling and essential gene analysis , 2016, Proceedings of the National Academy of Sciences.

[25]  Philip E. Bourne,et al.  Drug Off-Target Effects Predicted Using Structural Analysis in the Context of a Metabolic Network Model , 2010, PLoS Comput. Biol..

[26]  Marco Fondi,et al.  Comparative genome-scale modelling of Staphylococcus aureus strains identifies strain-specific metabolic capabilities linked to pathogenicity , 2016, Proceedings of the National Academy of Sciences.

[27]  Alexander S. Rose,et al.  NGLview–interactive molecular graphics for Jupyter notebooks , 2018, Bioinform..

[28]  Edward J. O'Brien,et al.  Genome-scale models of metabolism and gene expression extend and refine growth phenotype prediction , 2013, Molecular systems biology.

[29]  Joshua A. Lerman,et al.  COBRApy: COnstraints-Based Reconstruction and Analysis for Python , 2013, BMC Systems Biology.

[30]  Alexander S. Rose,et al.  NGL Viewer: a web application for molecular visualization , 2015, Nucleic Acids Res..

[31]  Bernhard O. Palsson,et al.  Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways , 2015, PLoS Comput. Biol..

[32]  Stavros J. Hamodrakas,et al.  A Consensus Method for the Prediction of ‘Aggregation-Prone’ Peptides in Globular Proteins , 2013, PloS one.

[33]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[34]  Philip E. Bourne,et al.  Comprar Structural Bioinformatics | Jenny Gu | 9780470181058 | Wiley , 2008 .

[35]  Wai Ong,et al.  Comparisons of Shewanella strains based on genome annotations, modeling, and experiments , 2014, BMC Systems Biology.

[36]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[37]  Bernard Manderick,et al.  PDB file parser and structure class implemented in Python , 2003, Bioinform..

[38]  Michael Nilges,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btl655 Structural bioinformatics Biskit—A software platform for structural bioinformatics , 2006 .

[39]  Adam M. Feist,et al.  Multi-omics Quantification of Species Variation of Escherichia coli Links Molecular Features with Strain Phenotypes. , 2016, Cell systems.

[40]  Motohisa Oobatake,et al.  Hydration and heat stability effects on protein unfolding , 1993 .

[41]  Marco Biasini,et al.  OpenStructure: a flexible software framework for computational structural biology , 2010, Bioinform..

[42]  Wes McKinney,et al.  Python for Data Analysis , 2012 .

[43]  Adam M. Feist,et al.  iML1515, a knowledgebase that computes Escherichia coli traits , 2017, Nature Biotechnology.

[44]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[45]  Lukasz Kurgan,et al.  Covering complete proteomes with X-ray structures: a current snapshot , 2014, Acta crystallographica. Section D, Biological crystallography.

[46]  Edward J. O'Brien,et al.  Thermosensitivity of growth is determined by chaperone-mediated proteome reallocation , 2017, Proceedings of the National Academy of Sciences.

[47]  Ines Thiele,et al.  Three-Dimensional Structural View of the Central Metabolic Network of Thermotoga maritima , 2009, Science.