Systems biology approaches integrated with artificial intelligence for optimized metabolic engineering

Metabolic engineering aims to maximize the production of bio-economically important substances (compounds, enzymes, or other proteins) through the optimization of the genetics, cellular processes and growth conditions of microorganisms. This requires detailed understanding of underlying metabolic pathways involved in the production of the targeted substances, and how the cellular processes or growth conditions are regulated by the engineering. To achieve this goal, a large system of experimental techniques, compound libraries, computational methods and data resources, including the multi-omics data, are used. The recent advent of multi-omics systems biology approaches significantly impacted the field by opening new avenues to perform dynamic and large-scale analyses that deepen our knowledge on the manipulations. However, with the enormous transcriptomics, proteomics and metabolomics available, it is a daunting task to integrate the data for a more holistic understanding. Novel data mining and analytics approaches, including Artificial Intelligence (AI), can provide breakthroughs where traditional low-throughput experiment-alone methods cannot easily achieve. Here, we review the latest attempts of combining systems biology and AI in metabolic engineering research, and highlight how this alliance can help overcome the current challenges facing industrial biotechnology, especially for food-related substances and compounds using microorganisms.

[1]  Dong-Myung Kim,et al.  Cell-Free Metabolic Engineering: Recent Developments and Future Prospects , 2019, Methods and protocols.

[2]  Jianfeng Liu,et al.  Transcriptome Analysis and Gene Expression Profiling of Abortive and Developing Ovules during Fruit Development in Hazelnut , 2015, PloS one.

[3]  Alice Munz Fernandes,et al.  Conceptual evolution and scientific approaches about synthetic meat , 2019, Journal of Food Science and Technology.

[4]  Masaru Tomita,et al.  Physical laws shape biology. , 2013, Science.

[5]  Dongguang Xiao,et al.  Engineering the oleaginous yeast Yarrowia lipolytica to produce limonene from waste cooking oil , 2019, Biotechnology for Biofuels.

[6]  Madana M. R. Ambavaram,et al.  Metabolic engineering to increase crop yield: From concept to execution. , 2018, Plant science : an international journal of experimental plant biology.

[7]  Joachim Kopka,et al.  Discovery of food identity markers by metabolomics and machine learning technology , 2019, Scientific Reports.

[8]  David Wingate,et al.  ProSPr: Democratized Implementation of Alphafold Protein Distance Prediction Network , 2019, bioRxiv.

[9]  Shane T. Grosser,et al.  Design of an in vitro biocatalytic cascade for the manufacture of islatravir , 2019, Science.

[10]  J. Walters,et al.  Insect genomes: progress and challenges , 2019, Insect molecular biology.

[11]  R. Eils,et al.  Impact of cancer mutational signatures on transcription factor motifs in the human genome , 2019, BMC Medical Genomics.

[12]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[13]  Filip Karlo Dosilovic,et al.  Explainable artificial intelligence: A survey , 2018, 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[14]  Adam M. Feist,et al.  Basic and applied uses of genome-scale metabolic network reconstructions of Escherichia coli , 2013, Molecular systems biology.

[15]  Joel Armstrong,et al.  Whole-Genome Alignment and Comparative Annotation. , 2019, Annual review of animal biosciences.

[16]  Rodrigo Ramos Catharino,et al.  Combining Machine Learning and Metabolomics to Identify Weight Gain Biomarkers , 2020, Frontiers in Bioengineering and Biotechnology.

[17]  Paul N. Devine,et al.  Biocatalytic Asymmetric Synthesis of Chiral Amines from Ketones Applied to Sitagliptin Manufacture , 2010, Science.

[18]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[19]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[20]  Xixian Chen,et al.  A “plug‐n‐play” modular metabolic system for the production of apocarotenoids , 2018, Biotechnology and bioengineering.

[21]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[22]  Raúl García-Granados,et al.  Metabolic Engineering and Synthetic Biology: Synergies, Future, and Challenges , 2019, Front. Bioeng. Biotechnol..

[23]  Gary D. Bader,et al.  Predicting PDZ domain mediated protein interactions from structure , 2013, BMC Bioinformatics.

[24]  Steven Skiena,et al.  DeepAnnotator: Genome Annotation with Deep Learning , 2018, BCB.

[25]  Calvin Wing Yiu Chan,et al.  Impact of cancer mutational signatures on transcription factor motifs in the human genome , 2018, BMC Medical Genomics.

[26]  Sutanu Nandi,et al.  An integrative machine learning strategy for improved prediction of essential genes in Escherichia coli metabolism using flux-coupled features. , 2017, Molecular bioSystems.

[27]  Thomas Colthurst,et al.  A universal SNP and small-indel variant caller using deep neural networks , 2018, Nature Biotechnology.

[28]  Miguel Rocha,et al.  A Review of Dynamic Modeling Approaches and Their Application in Computational Strain Optimization for Metabolic Engineering , 2018, Front. Microbiol..

[29]  Vassilios S. Vassiliadis,et al.  Dynamic modelling of high biomass density cultivation and biohydrogen production in different scales of flat plate photobioreactors , 2015, Biotechnology and bioengineering.

[30]  Sudha Shukal,et al.  Systematic engineering for high-yield production of viridiflorol and amorphadiene in auxotrophic Escherichia coli. , 2019, Metabolic engineering.

[31]  Marcus Oswald,et al.  Machine learning based analyses on metabolic networks supports high-throughput knockout screens , 2008, BMC Systems Biology.

[32]  Satoshi Omura,et al.  Genome mining of the Streptomycesavermitilis genome and development of genome-minimized hosts for heterologous expression of biosynthetic gene clusters , 2014, Journal of Industrial Microbiology & Biotechnology.

[33]  Nikos Kyrpides,et al.  Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements , 2016, Nucleic Acids Res..

[34]  Victor Guryev,et al.  Proteogenomics: From next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine. , 2019, Clinica chimica acta; international journal of clinical chemistry.

[35]  Ryan S. Senger,et al.  A review of metabolic and enzymatic engineering strategies for designing and optimizing performance of microbial cell factories , 2014, Computational and structural biotechnology journal.

[36]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[37]  Guang-Fu Yang,et al.  Structure-based drug design: strategies and challenges. , 2014, Current pharmaceutical design.

[38]  Andreas M. Kaplan,et al.  Siri, Siri, in my hand: Who’s the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence , 2019, Business Horizons.

[39]  Carlos Zednik,et al.  Solving the Black Box Problem: A Normative Framework for Explainable Artificial Intelligence , 2019, Philosophy & Technology.

[40]  Philip Owende,et al.  Biofuels from microalgae—A review of technologies for production, processing, and extractions of biofuels and co-products , 2010 .

[41]  L. Wackett An annotated selection of World Wide Web sites relevant to the topics in Microbial Biotechnology , 2013, Microbial biotechnology.

[42]  Lukas Zimmermann,et al.  A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. , 2017, Journal of molecular biology.

[43]  Nicolai Kallscheuer,et al.  Engineered Microorganisms for the Production of Food Additives Approved by the European Union—A Systematic Analysis , 2018, Front. Microbiol..

[44]  Stephen Muggleton,et al.  Application of abductive ILP to learning metabolic network inhibition from temporal data , 2006, Machine Learning.

[45]  Neil Swainston,et al.  Machine Learning of Designed Translational Control Allows Predictive Pathway Optimization in Escherichia coli. , 2019, ACS synthetic biology.

[46]  Jianyi Yang,et al.  Improved protein structure prediction using predicted interresidue orientations , 2020, Proceedings of the National Academy of Sciences.

[47]  Jerzy Adamski,et al.  Metabolomics meets machine learning: Longitudinal metabolite profiling in serum of normal versus overconditioned cows and pathway analysis. , 2019, Journal of dairy science.

[48]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[49]  Heidi Ledford,et al.  Dozens of coronavirus drugs are in development — what happens next? , 2020, Nature.

[50]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[51]  Feng Zhu,et al.  Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning , 2019, Briefings Bioinform..

[52]  H. Shimizu,et al.  Flux analysis and metabolomics for systematic metabolic engineering of microorganisms. , 2013, Biotechnology advances.

[53]  Roland L. Dunbrack,et al.  The Rosetta all-atom energy function for macromolecular modeling and design , 2017, bioRxiv.

[54]  Chris Sander,et al.  Pathway Commons 2019 Update: integration, analysis and exploration of pathway data , 2019, Nucleic Acids Res..

[55]  Igor V. Tetko,et al.  MIPS bacterial genomes functional annotation benchmark dataset , 2005, Bioinform..

[56]  James C Liao,et al.  Ensemble Modeling for Robustness Analysis in engineering non-native metabolic pathways. , 2014, Metabolic engineering.

[57]  Masa Tsuchiya,et al.  Signaling Flux Redistribution at Toll-Like Receptor Pathway Junctions , 2008, PloS one.

[58]  Eugenio Aprea,et al.  Use of terpenoids as natural flavouring compounds in food industry. , 2011, Recent patents on food, nutrition & agriculture.

[59]  Masa Tsuchiya,et al.  Systematic Determination of Biological Network Topology: Nonintegral Connectivity Method (NICM) , 2007 .

[60]  Ruanbao Zhou,et al.  Photobioreactor cultivation strategies for microalgae and cyanobacteria , 2018, Biotechnology progress.

[61]  Zachary A. King,et al.  Constraint-based models predict metabolic and associated cellular functions , 2014, Nature Reviews Genetics.

[62]  Celine Vens,et al.  Machine learning for discovering missing or wrong protein function annotations , 2019, BMC Bioinformatics.

[63]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[64]  Ljubisa Miskovic,et al.  iSCHRUNK--In Silico Approach to Characterization and Reduction of Uncertainty in the Kinetic Models of Genome-scale Metabolic Networks. , 2016, Metabolic engineering.

[65]  Xueyang Feng,et al.  DeepMetabolism: A Deep Learning System to Predict Phenotype from Genome Sequencing , 2017, bioRxiv.

[66]  Yinjie J. Tang,et al.  Engineering the oleaginous yeast Yarrowia lipolytica to produce the aroma compound β-ionone , 2018, Microbial Cell Factories.

[67]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[68]  Vincent Piras,et al.  Transcriptome-wide Variability in Single Embryonic Development Cells , 2014, Scientific Reports.

[69]  D. Ludwig,et al.  The Ketogenic Diet: Evidence for Optimism but High-Quality Research Needed , 2019, The Journal of nutrition.

[70]  Shuchi H. Desai,et al.  Metabolic engineering for higher alcohol production. , 2014, Metabolic engineering.

[71]  William R Cluett,et al.  Constructing kinetic models of metabolism at genome‐scales: A review , 2015, Biotechnology journal.

[72]  Miroslava Cuperlovic-Culf,et al.  Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling , 2018, Metabolites.

[73]  Di Liu,et al.  Machine learning framework for assessment of microbial factory performance , 2019, PloS one.

[74]  Gernot Rieder,et al.  Big Data: A New Empiricism and its Epistemic and Socio-Political Consequences , 2017 .

[75]  Peter D. Karp,et al.  The MetaCyc database of metabolic pathways and enzymes - a 2019 update , 2019, Nucleic Acids Res..

[76]  Ghazaleh Khodabandelou,et al.  Genome annotation across species using deep convolutional neural networks , 2020, PeerJ Comput. Sci..

[77]  James D. Winkler,et al.  The LASER database: Formalizing design rules for metabolic engineering , 2015, Metabolic engineering communications.

[78]  Jean Armengaud,et al.  A perfect genome annotation is within reach with the proteomics and genomics alliance. , 2009, Current opinion in microbiology.

[79]  Xixian Chen,et al.  Microbial astaxanthin biosynthesis: recent achievements, challenges, and commercialization outlook , 2020, Applied Microbiology and Biotechnology.

[80]  Mark Borodovsky New Machine Learning Algorithms for Genome Annotation , 2019, BCB.

[81]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[82]  Edward J. O'Brien,et al.  Characterizing Strain Variation in Engineered E. coli Using a Multi-Omics-Based Workflow. , 2016, Cell systems.

[83]  Jenna Burrell,et al.  How the machine ‘thinks’: Understanding opacity in machine learning algorithms , 2016 .

[84]  Xixian Chen,et al.  Agrocybe aegerita serves as a gateway for identifying sesquiterpene biosynthetic enzymes in higher fungi. , 2020, ACS chemical biology.

[85]  B. Watzl,et al.  Healthy low nitrogen footprint diets , 2020, Global food security.

[86]  Stan Matwin,et al.  Functional Annotation of Genes Using Hierarchical Text Categorization , 2005 .

[87]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[88]  Sara El-Metwally,et al.  Challenges in the Next-Generation Sequencing Field , 2014 .

[89]  Jens Keilwagen,et al.  Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi , 2018, BMC Bioinformatics.

[90]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[91]  Yuanfang Guan,et al.  Genome-Wide Functional Annotation of Human Protein-Coding Splice Variants Using Multiple Instance Learning. , 2016, Journal of proteome research.

[93]  Kumar Selvarajoo,et al.  Macroscopic law of conservation revealed in the population dynamics of Toll-like receptor signaling , 2011, Cell Communication and Signaling.

[94]  Rob Vos,et al.  COVID-19 risks to global food security , 2020, Science.

[95]  D. Smith,et al.  The impact of cancer. , 1989, Dimensions in oncology nursing : journal of the Division of Nursing.

[96]  Hideya Kawaji,et al.  Genome Annotation. , 2017, Methods in molecular biology.

[97]  J. Liao,et al.  Ensemble modeling of metabolic networks. , 2008, Biophysical journal.

[98]  S. Zick,et al.  Diets for Health: Goals and Guidelines. , 2018, American family physician.

[99]  Masa Tsuchiya,et al.  Can Complex Cellular Processes be Governed by Simple Linear Rules? , 2009, J. Bioinform. Comput. Biol..

[100]  Kumar Selvarajoo,et al.  Order Parameter in Bacterial Biofilm Adaptive Response , 2018, Front. Microbiol..

[101]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[102]  Kevin Y. Yip,et al.  Machine learning and genome annotation: a match meant to be? , 2013, Genome Biology.

[103]  Pedro A Saa,et al.  Formulation, construction and analysis of kinetic models of metabolism: A review of modelling frameworks. , 2017, Biotechnology advances.

[104]  Yuxuan Wang,et al.  Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming , 2016, PLoS Comput. Biol..

[105]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[106]  J. Peponis Formulation , 1997, Karaite Marriage Contracts from the Cairo Geniza.

[107]  Thomas D. Niehaus,et al.  Comparative genomics approaches to understanding and manipulating plant metabolism. , 2013, Current opinion in biotechnology.

[108]  Sang Yup Lee,et al.  Systems Metabolic Engineering Strategies: Integrating Systems and Synthetic Biology with Metabolic Engineering. , 2019, Trends in biotechnology.

[109]  Kumar Selvarajoo,et al.  A systems biology approach to overcome TRAIL resistance in cancer treatment. , 2017, Progress in biophysics and molecular biology.

[110]  Peter D. Karp,et al.  The MetaCyc Database , 2002, Nucleic Acids Res..

[111]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[112]  Santiago Comba,et al.  Emerging engineering principles for yield improvement in microbial cell design , 2012, Computational and structural biotechnology journal.

[113]  L. Segal John , 2013, The Messianic Secret.

[114]  Adam P. Arkin,et al.  The JBEI quantitative metabolic modeling library (jQMM): a python library for modeling microbial metabolism , 2017, BMC Bioinformatics.

[115]  Daniel Quest,et al.  Next generation models for storage and representation of microbial biological annotation , 2010, BMC Bioinformatics.

[116]  G. Moghe,et al.  Machine learning: A powerful tool for gene function prediction in plants , 2020, Applications in plant sciences.

[117]  Harald Pichler,et al.  Identifying and engineering the ideal microbial terpenoid production host , 2019, Applied Microbiology and Biotechnology.

[118]  P. K. Ajikumar,et al.  The future of metabolic engineering and synthetic biology: towards a systematic practice. , 2012, Metabolic engineering.

[119]  Zak Costello,et al.  A machine learning approach to predict metabolic pathway dynamics from time-series multiomics data , 2018, npj Systems Biology and Applications.

[120]  Kevin Struhl,et al.  Genome-scale identification of transcription factors that mediate an inflammatory network during breast cellular transformation , 2018, Nature Communications.

[121]  James C Liao,et al.  Ensemble modeling for strain development of L-lysine-producing Escherichia coli. , 2009, Metabolic engineering.

[122]  Yinjie J. Tang,et al.  Facilitate Collaborations among Synthetic Biology, Metabolic Engineering and Machine Learning , 2016 .

[123]  Christoph B. Messner,et al.  Machine Learning Predicts the Yeast Metabolome from the Quantitative Proteome of Kinase Knockouts , 2018, Cell systems.

[124]  Lars M Blank,et al.  Machine Learning Applications for Mass Spectrometry-Based Metabolomics , 2020, Metabolites.

[125]  Kathleen A. Curran,et al.  Design of synthetic yeast promoters via tuning of nucleosome architecture , 2014, Nature Communications.

[126]  Rick L. Stevens,et al.  KBase: The United States Department of Energy Systems Biology Knowledgebase , 2018, Nature Biotechnology.

[127]  Masa Tsuchiya,et al.  Predicting Novel Features of Toll-Like Receptor 3 Signaling in Macrophages , 2009, PloS one.

[128]  Joshua Heinemann,et al.  Machine Learning in Untargeted Metabolomics Experiments. , 2018, Methods in molecular biology.

[129]  Fidel Vargas Challenges for the Next Generation , 1994 .

[130]  Huimin Zhao,et al.  Biosystems Design by Machine Learning. , 2020, ACS synthetic biology.

[131]  David Baker,et al.  High-resolution comparative modeling with RosettaCM. , 2013, Structure.

[132]  M. Borodovsky,et al.  Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes , 2018, Genome research.

[133]  Pernilla Wittung-Stafshede,et al.  How do cofactors modulate protein folding? , 2005, Protein and peptide letters.

[134]  Peter D. Karp,et al.  Machine learning methods for metabolic pathway prediction , 2010 .

[135]  Marianne J. Ellis,et al.  Bringing cultured meat to market: Technical, socio-political, and regulatory challenges in cellular agriculture , 2018, Trends in food science & technology.

[136]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[137]  Z. Yakhini,et al.  Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters , 2012, Nature Biotechnology.

[138]  Alexandra Poulovassilis,et al.  Data Mining Process , 2009, Encyclopedia of Database Systems.

[139]  Gary D. Bader,et al.  Ten Simple Rules for Developing Public Biological Databases , 2016, PLoS Comput. Biol..

[140]  Jonathan Strutz,et al.  Bayesian inference of metabolic kinetics from genome-scale multiomics data , 2018, bioRxiv.

[141]  Kexuan Tang,et al.  Metabolic engineering of vitamin C production in Arabidopsis , 2015, Biotechnology and Bioprocess Engineering.

[142]  George A. Khoury,et al.  Protein folding and de novo protein design for biotechnological applications. , 2014, Trends in biotechnology.

[143]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[144]  R. Siezen,et al.  others , 1999, Microbial Biotechnology.

[145]  J. Nielsen,et al.  Advancing metabolic engineering through systems biology of industrial microorganisms. , 2015, Current opinion in biotechnology.