A mechanism-aware and multiomic machine-learning pipeline characterizes yeast cell growth

Significance Linking genotype and phenotype is a fundamental problem in biology, key to several biomedical and biotechnological applications. Cell growth is a central phenotypic trait, resulting from interactions between environment, gene regulation, and metabolism, yet its functional bases are still not completely understood. We propose and test a machine-learning approach that integrates large-scale gene expression profiles and mechanistic metabolic models, for characterizing cell growth and understanding its driving mechanisms in Saccharomyces cerevisiae. At its core, a custom-built multimodal learning method merges experimentally generated and model-generated data. We show that our approach can leverage the advantages of both machine learning and metabolic modeling, revealing unknown interactions between biological domains, incorporating mechanistic knowledge, and therefore overcoming black-box limitations of conventional data-driven approaches. Metabolic modeling and machine learning are key components in the emerging next generation of systems and synthetic biology tools, targeting the genotype–phenotype–environment relationship. Rather than being used in isolation, it is becoming clear that their value is maximized when they are combined. However, the potential of integrating these two frameworks for omic data augmentation and integration is largely unexplored. We propose, rigorously assess, and compare machine-learning–based data integration techniques, combining gene expression profiles with computationally generated metabolic flux data to predict yeast cell growth. To this end, we create strain-specific metabolic models for 1,143 Saccharomyces cerevisiae mutants and we test 27 machine-learning methods, incorporating state-of-the-art feature selection and multiview learning approaches. We propose a multiview neural network using fluxomic and transcriptomic data, showing that the former increases the predictive accuracy of the latter and reveals functional patterns that are not directly deducible from gene expression alone. We test the proposed neural network on a further 86 strains generated in a different experiment, therefore verifying its robustness to an additional independent dataset. Finally, we show that introducing mechanistic flux features improves the predictions also for knockout strains whose genes were not modeled in the metabolic reconstruction. Our results thus demonstrate that fusing experimental cues with in silico models, based on known biochemistry, can contribute with disjoint information toward biologically informed and interpretable machine learning. Overall, this study provides tools for understanding and manipulating complex phenotypes, increasing both the prediction accuracy and the extent of discernible mechanistic biological insights.

[1]  Ratul Chowdhury,et al.  Using Gene Essentiality and Synthetic Lethality Information to Correct Yeast and CHO Cell Genome-Scale Models , 2015, Metabolites.

[2]  Diogo M. Camacho,et al.  Next-Generation Machine Learning for Biological Networks , 2018, Cell.

[3]  Naama Barkai,et al.  Coordination of gene expression with growth rate: A feedback or a feed‐forward strategy? , 2009, FEBS letters.

[4]  Pietro Lió,et al.  Seeing the wood for the trees: a forest of methods for optimization and omic-network integration in metabolic modelling , 2017, Briefings Bioinform..

[5]  V. Shahrezaei,et al.  Connecting growth with gene expression: of noise and numbers. , 2015, Current opinion in microbiology.

[6]  J. Pérez-Ortín,et al.  There is a steady‐state transcriptome in exponentially growing yeast cells , 2010, Yeast.

[7]  Edoardo M. Airoldi,et al.  Predicting Cellular Growth from Gene Expression Signatures , 2009, PLoS Comput. Biol..

[8]  J. Patton-Vogt,et al.  Phospholipid turnover and acyl chain remodeling in the yeast ER , 2019, Biochimica et biophysica acta. Molecular and cell biology of lipids.

[9]  V. Siewers,et al.  Advances in yeast genome engineering. , 2014, FEMS yeast research.

[10]  Anushya Muruganujan,et al.  Protocol Update for large-scale genome and gene function analysis with the PANTHER classification system (v.14.0) , 2019, Nature Protocols.

[11]  Yang Wang,et al.  Applications of Support Vector Machine (SVM) Learning in Cancer Genomics. , 2018, Cancer genomics & proteomics.

[12]  B. Palsson Systems Biology: Constraint-based Reconstruction and Analysis , 2015 .

[13]  Claudio Angione,et al.  Machine and deep learning meet genome-scale metabolic modeling , 2019, PLoS Comput. Biol..

[14]  Yan Liu,et al.  Trs20, Trs23, Trs31 and Bet5 participate in autophagy through GTPase Ypt1 in Saccharomyces cerevisiae , 2018 .

[15]  Limsoon Wong,et al.  Why Batch Effects Matter in Omics Data, and How to Avoid Them. , 2017, Trends in biotechnology.

[16]  Mehdi M. Kashani,et al.  Large-Scale Genetic Perturbations Reveal Regulatory Networks and an Abundance of Gene-Specific Repressors , 2014, Cell.

[17]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[18]  W. Wiechert,et al.  How to measure metabolic fluxes: a taxonomic guide for (13)C fluxomics. , 2015, Current opinion in biotechnology.

[19]  Philip Lijnzaad,et al.  A high-resolution gene expression atlas of epistasis between gene-specific transcription factors exposes potential mechanisms for genetic interactions , 2015, BMC Biology.

[20]  M. Polymenis,et al.  Sulfur Metabolism Actively Promotes Initiation of Cell Division in Yeast , 2009, PloS one.

[21]  Claudio Angione,et al.  The poly-omics of ageing through individual-based metabolic modelling , 2018, BMC Bioinformatics.

[22]  Mikael Henaff,et al.  Information content and analysis methods for Multi-Modal High-Throughput Biomedical Data , 2014, Scientific Reports.

[23]  Patrick H. Bradley,et al.  Growth-limiting Intracellular Metabolites in Yeast Growing under Diverse Nutrient Limitations , 2010, Molecular biology of the cell.

[24]  Eugenio Cinquemani,et al.  Mathematical modelling of microbes: metabolism, gene expression and growth , 2017, Journal of The Royal Society Interface.

[25]  Ilias Tagkopoulos,et al.  Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli , 2016, Nature Communications.

[26]  Claudio Angione,et al.  Integrating splice‐isoform expression into genome‐scale models characterizes breast cancer metabolism , 2018, Bioinform..

[27]  J. Nielsen,et al.  Yeast systems biology in understanding principles of physiology underlying complex human diseases. , 2019, Current opinion in biotechnology.

[29]  T. Hwa,et al.  Interdependence of Cell Growth and Gene Expression: Origins and Consequences , 2010, Science.

[30]  Joachim Selbig,et al.  F2C2: a fast tool for the computation of flux coupling in genome-scale metabolic networks , 2012, BMC Bioinformatics.

[31]  Minoru Kanehisa,et al.  New approach for understanding genome variations in KEGG , 2018, Nucleic Acids Res..

[32]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[33]  J. Broach Nutritional Control of Growth and Development in Yeast , 2012, Genetics.

[34]  James B. Brown,et al.  Iterative random forests to discover predictive and stable high-order interactions , 2017, Proceedings of the National Academy of Sciences.

[35]  Jing Liang,et al.  Genome-scale engineering of Saccharomyces cerevisiae with single-nucleotide precision , 2018, Nature Biotechnology.

[36]  R. Sharan,et al.  Metabolic Network Prediction of Drug Side Effects. , 2016, Cell systems.

[37]  Roded Sharan,et al.  Using deep learning to model the hierarchical structure and function of a cell , 2018, Nature Methods.

[38]  Daniel Machado,et al.  Systematic Evaluation of Methods for Integration of Transcriptomic Data into Constraint-Based Models of Metabolism , 2014, PLoS Comput. Biol..

[39]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[40]  Markus J. Herrgård,et al.  A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology , 2008, Nature Biotechnology.

[41]  Claudio Angione,et al.  Combining metabolic modelling with machine learning accurately predicts yeast growth rate , 2019 .

[42]  Mehmet Gönen,et al.  Bayesian Efficient Multiple Kernel Learning , 2012, ICML.

[43]  Jason H. Yang,et al.  A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action , 2019, Cell.

[44]  Lars Steinmetz,et al.  The cellular growth rate controls overall mRNA turnover, and modulates either transcription or degradation rates of particular gene regulons , 2015, Nucleic acids research.

[45]  Alioune Ngom,et al.  A review on machine learning principles for multi-view biological data integration , 2016, Briefings Bioinform..

[46]  Philip Lijnzaad,et al.  Cell cycle population effects in perturbation studies , 2014, Molecular systems biology.

[47]  Ines Thiele,et al.  Predicting gastrointestinal drug effects using contextualized metabolic models , 2019, PLoS Comput. Biol..

[48]  X. Chen,et al.  Random forests for genomic data analysis. , 2012, Genomics.

[49]  Anne Richelle,et al.  A Systematic Evaluation of Methods for Tailoring Genome-Scale Metabolic Models. , 2017, Cell systems.

[50]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[51]  D. Botstein,et al.  Coupling among growth rate response, metabolic cycle, and cell division cycle in yeast , 2011, Molecular biology of the cell.

[52]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[53]  T. Gardner Synthetic biology: from hype to impact. , 2013, Trends in biotechnology.

[54]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[55]  近藤 麻木 The rate of cell growth is regulated by purine biosynthesis via ATP production and G1 to S phase transition , 2000 .

[56]  T. Kinzy,et al.  Mechanism and Regulation of Protein Synthesis in Saccharomyces cerevisiae , 2016, Genetics.

[57]  Anne Richelle,et al.  Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0 , 2019, Nature Protocols.

[58]  Gang Li,et al.  Genome-Scale Metabolic Modeling from Yeast to Human Cell Models of Complex Diseases: Latest Advances and Challenges. , 2019, Methods in molecular biology.

[59]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[60]  Xueyang Feng,et al.  DeepMetabolism: A Deep Learning System to Predict Phenotype from Genome Sequencing , 2017, bioRxiv.

[61]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[62]  Yan Liu,et al.  Trs 20 , Trs 23 , Trs 31 and Bet 5 participate in autophagy through GTPase Ypt 1 in Saccharomyces cerevisiae , 2018 .

[63]  A. Motter,et al.  Predicting growth rate from gene expression , 2018, Proceedings of the National Academy of Sciences.