Machine Learning in Enzyme Engineering

Enzyme engineering plays a central role in developing efficient biocatalysts for biotechnology, biomedicine, and life sciences. Apart from classical rational design and directed evolution approache...

[1]  M. Rooman,et al.  Solart: A Structure-Based Method To Predict Protein Solubility And Aggregation , 2019, bioRxiv.

[2]  Paolo Fontana,et al.  Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms , 2012, BMC Bioinformatics.

[3]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[4]  David S. Goodsell,et al.  RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy , 2018, Nucleic Acids Res..

[5]  Justin R Klesmith,et al.  Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning , 2017, Proceedings of the National Academy of Sciences.

[6]  Andreas Krause,et al.  Navigating the protein fitness landscape with Gaussian processes , 2012, Proceedings of the National Academy of Sciences.

[7]  Silvio C. E. Tosatto,et al.  Correct machine learning on protein sequences: a peer-reviewing perspective , 2016, Briefings Bioinform..

[8]  Abigail Sawyer,et al.  From Sanger sequencing to genome databases and beyond. , 2019, BioTechniques.

[9]  F. Arnold,et al.  Innovation by Evolution: Bringing New Chemistry to Life (Nobel Lecture). , 2019, Angewandte Chemie.

[10]  Zaheer Ullah Khan,et al.  Discrimination of acidic and alkaline enzyme using Chou's pseudo amino acid composition in conjunction with probabilistic neural network model. , 2015, Journal of theoretical biology.

[11]  Y Husimi,et al.  Analysis of a local fitness landscape with a model of the rough Mt. Fuji-type landscape: application to prolyl endopeptidase and thermolysin. , 2000, Biopolymers.

[12]  Tom L. Blundell,et al.  Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity? , 2014, J. Chem. Inf. Model..

[13]  John Canny,et al.  Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.

[14]  John M Woodley,et al.  Accelerating the implementation of biocatalysis in industry , 2019, Applied Microbiology and Biotechnology.

[15]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[16]  Roberto A Chica,et al.  ProtaBank: A repository for protein design and engineering data , 2018, bioRxiv.

[17]  Liisa Holm,et al.  PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment , 2015, Bioinform..

[18]  Manfred T. Reetz,et al.  Can Machine Learning Revolutionize Directed Evolution of Selective Enzymes? , 2019, Advanced Synthesis & Catalysis.

[19]  Piero Fariselli,et al.  A natural upper bound to the accuracy of predicting protein stability changes upon mutations , 2018, Bioinform..

[20]  Silvio C. E. Tosatto,et al.  InterPro in 2019: improving coverage, classification and access to protein sequence annotations , 2018, Nucleic Acids Res..

[21]  Yuval Nov,et al.  Improving Biocatalyst Performance by Integrating Statistical Methods into Protein Engineering , 2010, Applied and Environmental Microbiology.

[22]  Yang Yang,et al.  PON-Sol: prediction of effects of amino acid substitutions on protein solubility , 2016, Bioinform..

[23]  Gabrielle Chataigné,et al.  High-throughput strategies for the discovery and engineering of enzymes for biocatalysis , 2017, Bioprocess and Biosystems Engineering.

[24]  S. Anderson,et al.  Predicting the reactivity of proteins from their sequence alone: Kazal family of protein inhibitors of serine proteinases. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[25]  S. Fields,et al.  Deep mutational scanning: a new style of protein science , 2014, Nature Methods.

[26]  Peter J. Halling,et al.  Standards for Reporting Enzyme Data: The STRENDA Consortium: What it aims to do and why it should be helpful , 2014 .

[27]  Frances H. Arnold,et al.  The nature of chemical innovation: new enzymes by evolution* , 2015, Quarterly Reviews of Biophysics.

[28]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[29]  Nobuaki Kono,et al.  Nanopore sequencing: Review of potential applications in functional genomics , 2019, Development, growth & differentiation.

[30]  Daniel W. A. Buchan,et al.  Protein function prediction by massive integration of evolutionary analyses and multiple data sources , 2013, BMC Bioinformatics.

[31]  Liangjiang Wang,et al.  Sequence feature-based prediction of protein stability changes upon amino acid substitutions , 2010, BMC Genomics.

[32]  Ge Qu,et al.  The Crucial Role of Methodology Development in Directed Evolution of Selective Enzymes. , 2020, Angewandte Chemie.

[33]  Ben Lehner,et al.  Determining protein structures using deep mutagenesis , 2019, Nature Genetics.

[34]  Raghavan Varadarajan,et al.  Insights into protein structure, stability and function from saturation mutagenesis. , 2018, Current opinion in structural biology.

[35]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[36]  Fabian J Theis,et al.  Deep learning: new computational modelling techniques for genomics , 2019, Nature Reviews Genetics.

[37]  Hyun Uk Kim,et al.  Machine learning applications in systems metabolic engineering. , 2020, Current opinion in biotechnology.

[38]  Jan Brezovsky,et al.  Impact of the access tunnel engineering on catalysis is strictly ligand‐specific , 2018, The FEBS journal.

[39]  Debora S Marks,et al.  Deep generative models of genetic variation capture the effects of mutations , 2018, Nature Methods.

[40]  J Damborský,et al.  Quantitative structure-function and structure-stability relationships of purposely modified proteins. , 1998, Protein engineering.

[41]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[42]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[43]  C. Sander,et al.  Inferring protein 3D structure from deep mutation scans , 2019, Nature Genetics.

[44]  Liang-Tsung Huang,et al.  iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations , 2007, Bioinform..

[45]  David Bednar,et al.  Computational Design of Stable and Soluble Biocatalysts , 2018, ACS Catalysis.

[46]  Marianne Rooman,et al.  Symmetry principles in optimization problems: an application to protein stability prediction , 2015 .

[47]  Saurabh Sinha,et al.  Towards a fully automated algorithm driven platform for biosystems design , 2019, Nature Communications.

[48]  Benjamin G. Davis,et al.  Functional and informatics analysis enables glycosyltransferase activity prediction , 2018, Nature Chemical Biology.

[49]  Jennifer Listgarten,et al.  Conditioning by adaptive sampling for robust design , 2019, ICML.

[50]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[51]  Yang Yang,et al.  PON-tstab: Protein Variant Stability Predictor. Importance of Training Data Quality , 2018, International journal of molecular sciences.

[52]  Marcus C. Chibucos,et al.  The Confidence Information Ontology: a step towards a standard for asserting confidence in annotations , 2015, Database J. Biol. Databases Curation.

[53]  Geoffrey I. Webb,et al.  Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework , 2018, Briefings Bioinform..

[54]  Tsuyoshi Kato,et al.  EzCatDB: the enzyme reaction database, 2015 update , 2014, Nucleic Acids Res..

[55]  M. Vihinen,et al.  Variation Interpretation Predictors: Principles, Types, Performance, and Choice , 2016, Human mutation.

[56]  Moritz Pott,et al.  Speeding up enzyme discovery and engineering with ultrahigh-throughput methods. , 2018, Current opinion in structural biology.

[57]  Patricia C. Babbitt,et al.  Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space , 2013, PLoS Comput. Biol..

[58]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[59]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[60]  John B. O. Mitchell,et al.  From sequence to enzyme mechanism using multi-label machine learning , 2014, BMC Bioinformatics.

[61]  John C Whitman,et al.  Improving catalytic function by ProSAR-driven enzyme evolution , 2007, Nature Biotechnology.

[62]  G Bucht,et al.  Optimising the signal peptide for glycosyl phosphatidylinositol modification of human acetylcholinesterase using mutational analysis and peptide-quantitative structure-activity relationships. , 1999, Biochimica et biophysica acta.

[63]  Yuji Nagata,et al.  Quantitative analysis of substrate specificity of haloalkane dehalogenase LinB from Sphingomonas paucimobilis UT26. , 2005, Biochemistry.

[64]  Amarda Shehu,et al.  A Survey of Computational Methods for Protein Function Prediction , 2016 .

[65]  Shoji Takada,et al.  Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins , 2009, Proceedings of the National Academy of Sciences.

[66]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[67]  Marc Garcia-Borràs,et al.  Computational tools for the evaluation of laboratory-engineered biocatalysts , 2016, Chemical communications.

[68]  Jerzy K. Kulski,et al.  Next Generation Sequencing - Advances, Applications and Challenges , 2016 .

[69]  Bernhard Hauer,et al.  Directed Evolution Empowered Redesign of Natural Proteins for the Sustainable Production of Chemicals and Pharmaceuticals. , 2018, Angewandte Chemie.

[70]  Manolis Kellis,et al.  Deep learning for regulatory genomics , 2015, Nature Biotechnology.

[71]  James Green,et al.  ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins , 2015, BMC Bioinformatics.

[72]  Janet M. Thornton,et al.  Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites , 2017, Nucleic Acids Res..

[73]  Roded Sharan,et al.  Using deep learning to model the hierarchical structure and function of a cell , 2018, Nature Methods.

[74]  Timothy A. Whitehead,et al.  Deep sequencing methods for protein engineering and design. , 2017, Current opinion in structural biology.

[75]  Nikos Paragios,et al.  EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation , 2017, PeerJ.

[76]  Zachary Wu,et al.  Machine learning-assisted directed protein evolution with combinatorial libraries , 2019, Proceedings of the National Academy of Sciences.

[77]  Yang Zhang,et al.  COFACTOR: improved protein function prediction by combining structure, sequence and protein–protein interaction information , 2017, Nucleic Acids Res..

[78]  Alexander Gammerman,et al.  Hedging predictions in machine learning , 2006, ArXiv.

[79]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[80]  Philip Mair,et al.  Exploring sequence space in search of functional enzymes using microfluidic droplets. , 2017, Current opinion in chemical biology.

[81]  Marianne Rooman,et al.  Quantification of biases in predictions of protein stability changes upon mutations , 2018, bioRxiv.

[82]  Benjamin J. Raphael,et al.  Visible Machine Learning for Biomedicine , 2018, Cell.

[83]  A. Ameur,et al.  Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics , 2018, Nucleic acids research.

[84]  Antje Chang,et al.  BRENDA in 2019: a European ELIXIR core data resource , 2018, Nucleic Acids Res..

[85]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[86]  Roger A Sheldon,et al.  Biocatalysis engineering: the big picture. , 2017, Chemical Society reviews.

[87]  Andrew Currin,et al.  Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently , 2014, Chemical Society reviews.

[88]  Jeffrey Skolnick,et al.  EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes , 2012, Bioinform..

[89]  Lihua Li,et al.  DEEPre: sequence-based enzyme EC number prediction by deep learning , 2017, Bioinform..

[90]  Aleksej Zelezniak,et al.  Expanding functional protein sequence space using generative adversarial networks , 2019, bioRxiv.

[91]  Kevin K. Yang,et al.  Machine-learning-guided directed evolution for protein engineering , 2018, Nature Methods.

[92]  Klaus-Robert Müller,et al.  Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models , 2017, ArXiv.

[93]  Bernard Henrissat,et al.  Key challenges for the creation and maintenance of specialist protein resources , 2015, Proteins.

[94]  Bo Yang,et al.  NegGOA: negative GO annotations selection using ontology structure , 2016, Bioinform..

[95]  Bela Stantic,et al.  EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models. , 2016, Journal of molecular biology.

[96]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[97]  Jiří Damborský,et al.  Quantitative Structure-Function Relationships of the Single-Point Mutants of Haloalkane Dehalogenase: A Multivariate Approach , 1997 .

[98]  Xiaonan Wang,et al.  Develop machine learning-based regression predictive models for engineering protein solubility , 2019, Bioinform..