Discovering de novo peptide substrates for enzymes using machine learning

The discovery of peptide substrates for enzymes with exclusive, selective activities is a central goal in chemical biology. In this paper, we develop a hybrid computational and biochemical method to rapidly optimize peptides for specific, orthogonal biochemical functions. The method is an iterative machine learning process by which experimental data is deposited into a mathematical algorithm that selects potential peptide substrates to be tested experimentally. Once tested, the algorithm uses the experimental data to refine future selections. This process is repeated until a suitable set of de novo peptide substrates are discovered. We employed this technology to discover orthogonal peptide substrates for 4’-phosphopantetheinyl transferase, an enzyme class that covalently modifies proteins. In this manner, we have demonstrated that machine learning can be leveraged to guide peptide optimization for specific biochemical functions not immediately accessible by biological screening techniques, such as phage display and random mutagenesis.The discovery of peptide substrates for enzymes with selective activities is a central goal in chemical biology. Here, the authors develop a hybrid method combining machine learning and experimental testing for fast optimization of peptides for specific, orthogononal functions.

[1]  Keng Siau,et al.  A review of data mining techniques , 2001, Ind. Manag. Data Syst..

[2]  David Ginsbourger,et al.  A Multi-points Criterion for Deterministic Parallel Global Optimization based on Kriging , 2007 .

[3]  Stefan Lutz,et al.  Beyond directed evolution--semi-rational protein engineering and design. , 2010, Current opinion in biotechnology.

[4]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[5]  Xavier Robert,et al.  Deciphering key features in protein structures with the new ENDscript server , 2014, Nucleic Acids Res..

[6]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[7]  Andreas Krause,et al.  Near-optimal Batch Mode Active Learning and Adaptive Submodular Optimization , 2013, ICML.

[8]  R. Frank The SPOT-synthesis technique. Synthetic peptide arrays on membrane supports--principles and applications. , 2002, Journal of immunological methods.

[9]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[10]  Kerstin Pingel,et al.  50 Years of Image Analysis , 2012 .

[11]  John Crosby,et al.  Analysis of Streptomyces coelicolor phosphopantetheinyl transferase, AcpS, reveals the basis for relaxed substrate specificity. , 2011, Biochemistry.

[12]  Joseph P Noel,et al.  The phosphopantetheinyl transferases: catalysis of a post-translational modification crucial for life. , 2014, Natural product reports.

[13]  Andy J. Keane,et al.  Engineering Design via Surrogate Modelling - A Practical Guide , 2008 .

[14]  Kevin W Eliceiri,et al.  NIH Image to ImageJ: 25 years of image analysis , 2012, Nature Methods.

[15]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[16]  J. Mockus Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[17]  Pieter C. Dorrestein,et al.  Top-down mass spectrometry on low-resolution instruments: characterization of phosphopantetheinylated carrier domains in polyketide and non-ribosomal biosynthetic pathways. , 2008, Bioorganic & medicinal chemistry letters.

[18]  Liang Tang,et al.  Automatic ad format selection via contextual bandits , 2013, CIKM.

[19]  N. Parachin,et al.  A functional screen for recovery of 4'-phosphopantetheinyl transferase and associated natural product biosynthesis genes from metagenome libraries. , 2012, Environmental microbiology.

[20]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21]  Andrew L. Ferguson,et al.  Machine learning-enabled discovery and design of membrane-active peptides. , 2017, Bioorganic & medicinal chemistry.

[22]  P. Cohen Protein kinases — the major drug targets of the twenty-first century? , 2002, Nature reviews. Drug discovery.

[23]  T. Terwilliger,et al.  Engineering and characterization of a superfolder green fluorescent protein , 2006, Nature Biotechnology.

[24]  Leroy Cronin,et al.  Using Evolutionary Algorithms and Machine Learning to Explore Sequence Space for the Discovery of Antimicrobial Peptides , 2018 .

[25]  Warren B. Powell,et al.  Optimal Learning: Powell/Optimal , 2012 .

[26]  Erik T. Mueller,et al.  Watson: Beyond Jeopardy! , 2013, Artif. Intell..

[27]  Neil L Kelleher,et al.  Genetically encoded short peptide tag for versatile protein labeling by Sfp phosphopantetheinyl transferase. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Johannes E. Schindelin,et al.  The ImageJ ecosystem: An open platform for biomedical image analysis , 2015, Molecular reproduction and development.

[29]  G. Salvesen,et al.  Emerging principles in protease-based drug discovery , 2010, Nature Reviews Drug Discovery.

[30]  Shivani Agarwal,et al.  Ranking Chemical Structures for Drug Discovery: A New Machine Learning Approach , 2010, J. Chem. Inf. Model..

[31]  Mark W. Craven,et al.  Sirt3 Substrate Specificity Determined by Peptide Arrays and Machine Learning , 2022 .

[32]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[33]  Karen Willcox,et al.  Multifidelity Optimization using Statistical Surrogate Modeling for Non-Hierarchical Information Sources , 2015 .

[34]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[35]  Anton Simeonov,et al.  A Platform to Enable the Pharmacological Profiling of Small Molecules in Gel-Based Electrophoretic Mobility Shift Assays , 2016, Journal of biomolecular screening.

[36]  James J La Clair,et al.  In vivo reporter labeling of proteins via metabolic delivery of coenzyme A analogues. , 2005, Journal of the American Chemical Society.

[37]  Peter Güntert,et al.  Crystal structure of a PCP/Sfp complex reveals the structural basis for carrier protein posttranslational modification. , 2014, Chemistry & biology.

[38]  Raymond A. Dwek,et al.  Targeting glycosylation as a therapeutic approach , 2002, Nature Reviews Drug Discovery.

[39]  Warren B. Powell,et al.  The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery , 2011, INFORMS J. Comput..

[40]  P. Silver,et al.  Genetically encoded short peptide tags for orthogonal protein labeling by Sfp and AcpS phosphopantetheinyl transferases. , 2007, ACS chemical biology.

[41]  Pieter C Dorrestein,et al.  Facile detection of acyl and peptidyl intermediates on thiotemplate carrier domains via phosphopantetheinyl elimination reactions during tandem mass spectrometry. , 2006, Biochemistry.