Multi-omic data integration enables discovery of hidden biological regularities

Rapid growth in size and complexity of biological data sets has led to the ‘Big Data to Knowledge' challenge. We develop advanced data integration methods for multi-level analysis of genomic, transcriptomic, ribosomal profiling, proteomic and fluxomic data. First, we show that pairwise integration of primary omics data reveals regularities that tie cellular processes together in Escherichia coli: the number of protein molecules made per mRNA transcript and the number of ribosomes required per translated protein molecule. Second, we show that genome-scale models, based on genomic and bibliomic data, enable quantitative synchronization of disparate data types. Integrating omics data with models enabled the discovery of two novel regularities: condition invariant in vivo turnover rates of enzymes and the correlation of protein structural motifs and translational pausing. These regularities can be formally represented in a computable format allowing for coherent interpretation and prediction of fitness and selection that underlies cellular physiology.

[1]  Matthew S. Sachs,et al.  Codon Usage Influences the Local Rate of Translation Elongation to Regulate Co-translational Protein Folding. , 2015, Molecular cell.

[2]  Mona Singh,et al.  Computational solutions for omics data , 2013, Nature Reviews Genetics.

[3]  Gene-Wei Li,et al.  The anti-Shine-Dalgarno sequence drives translational pausing and codon choice in bacteria , 2012, Nature.

[4]  Zoya Ignatova,et al.  Optimization of Translation Profiles Enhances Protein Expression and Solubility , 2015, PloS one.

[5]  Edward J. O'Brien,et al.  Genome-scale models of metabolism and gene expression extend and refine growth phenotype prediction , 2013, Molecular systems biology.

[6]  Rachel Green,et al.  Clarifying the Translational Pausing Landscape in Bacteria by Ribosome Profiling. , 2016, Cell reports.

[7]  Nicholas T. Ingolia,et al.  Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling , 2009, Science.

[8]  M. Mann,et al.  Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast , 2008, Nature.

[9]  Ilias Tagkopoulos,et al.  An integrative, multi-scale, genome-wide model reveals the phenotypic landscape of Escherichia coli , 2014, Molecular systems biology.

[10]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[11]  Adam M. Feist,et al.  A comprehensive genome-scale reconstruction of Escherichia coli metabolism—2011 , 2011, Molecular systems biology.

[12]  Edward J. O'Brien,et al.  Using Genome-scale Models to Predict Biological Capabilities , 2015, Cell.

[13]  James Taylor,et al.  Ribosome A and P sites revealed by length analysis of ribosome profiling data , 2015, Nucleic acids research.

[14]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[15]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[16]  B. Palsson,et al.  The model organism as a system: integrating 'omics' data sets , 2006, Nature Reviews Molecular Cell Biology.

[17]  Marco Gartmann,et al.  α-Helical nascent polypeptide chains visualized within distinct regions of the ribosomal exit tunnel , 2010, Nature Structural &Molecular Biology.

[18]  K. Valgepea,et al.  Comparison and applications of label-free absolute proteome quantification methods on Escherichia coli. , 2012, Journal of proteomics.

[19]  R. Aebersold,et al.  The quantitative and condition-dependent Escherichia coli proteome , 2015, Nature Biotechnology.

[20]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[21]  G. von Heijne,et al.  Different conformations of nascent polypeptides during translocation across the ER membrane , 2000, BMC Cell Biology.

[22]  Joshua A. Lerman,et al.  COBRApy: COnstraints-Based Reconstruction and Analysis for Python , 2013, BMC Systems Biology.

[23]  A. Komar,et al.  A pause for thought along the co-translational folding pathway. , 2009, Trends in biochemical sciences.

[24]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[25]  E. O’Shea,et al.  Global analysis of protein expression in yeast , 2003, Nature.

[26]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[27]  B. Palsson,et al.  A protocol for generating a high-quality genome-scale metabolic reconstruction , 2010 .

[28]  Zachary A. King,et al.  Constraint-based models predict metabolic and associated cellular functions , 2014, Nature Reviews Genetics.

[29]  Rachel Green,et al.  High-precision analysis of translational pausing by ribosome profiling in bacteria lacking EFP. , 2015, Cell reports.

[30]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[31]  Harkamal Walia,et al.  Protein abundances are more conserved than mRNA abundances across diverse taxa , 2010, Proteomics.

[32]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[33]  C. Deutsch,et al.  Transmembrane segments form tertiary hairpins in the folding vestibule of the ribosome. , 2014, Journal of molecular biology.

[34]  Ed Zintel,et al.  Resources , 1998, IT Prof..

[35]  M. Selbach,et al.  Global quantification of mammalian gene expression control , 2011, Nature.

[36]  Zhen Zhang,et al.  Systems biology of the structural proteome , 2016, BMC Systems Biology.

[37]  Annik Nanchen,et al.  Nonlinear Dependency of Intracellular Fluxes on Growth Rate in Miniaturized Continuous Cultures of Escherichia coli , 2006, Applied and Environmental Microbiology.

[38]  G. von Heijne,et al.  Cotranslational Protein Folding inside the Ribosome Exit Tunnel , 2015, Cell reports.

[39]  Roger L. Chang,et al.  Structural Systems Biology Evaluation of Metabolic Thermotolerance in Escherichia coli , 2013, Science.

[40]  David H Burkhardt,et al.  Quantifying Absolute Protein Synthesis Rates Reveals Principles Underlying Allocation of Cellular Resources , 2014, Cell.

[41]  M. Gerstein,et al.  Comparing protein abundance and mRNA expression levels on a genomic scale , 2003, Genome Biology.

[42]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[43]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[44]  Nathan E Lewis,et al.  Analysis of omics data with genome-scale models of metabolism. , 2013, Molecular bioSystems.

[45]  B. Palsson,et al.  A streamlined ribosome profiling protocol for the characterization of microorganisms. , 2015, BioTechniques.

[46]  S. Gygi,et al.  Correlation between Protein and mRNA Abundance in Yeast , 1999, Molecular and Cellular Biology.