Self-assembling Peptide Discovery: Overcoming Human Bias With Machine Learning

Peptide materials have a wide array of functions from tissue engineering, surface coatings to catalysis and sensing. This class of biopolymer is composed of a sequence, comprised of 20 naturally occurring amino acids whose arrangement dictate the peptide functionality. While it is highly desirable to tailor the amino acid sequence, a small increase in their sequence length leads to dramatic increase in the possible candidates (e.g., from tripeptide = 20^3 or 8,000 peptides to a pentapeptide = 20^5 or 3.2 M). Traditionally, peptide design is guided by the use of structural propensity tables, hydrophobicity scales, or other desired properties and typically yields <10 peptides per study, barely scraping the surface of the search space. These approaches, driven by human expertise and intuition, are not easily scalable and are riddled with human bias. Here, we introduce a machine learning workflow that combines Monte Carlo tree search and random forest, with molecular dynamics simulations to develop a fully autonomous computational search engine (named, AI-expert) to discover peptide sequences with high potential for self-assembly (as a representative target functionality). We demonstrate the efficacy of the AI-expert to efficiently search large spaces of tripeptides and pentapeptides. Subsequent experiments on the proposed peptide sequences are performed to compare the predictability of the AI-expert with those of human experts. The AI performs on-par or better than human experts and suggests several non-intuitive sequences with high self-assembly propensity, outlining its potential to overcome human bias and accelerate peptide discovery.

[1]  Randolph V Lewis,et al.  Spider silk: ancient ideas for new biomaterials. , 2006, Chemical reviews.

[2]  Qiang He,et al.  Transition of cationic dipeptide nanotubes into vesicles and oligonucleotide delivery. , 2007, Angewandte Chemie.

[3]  Berk Hess,et al.  GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .

[4]  Anand Chandrasekaran,et al.  Polymer Genome: A Data-Powered Polymer Informatics Platform for Property Predictions , 2018, The Journal of Physical Chemistry C.

[5]  Honggang Cui,et al.  Self‐assembly of peptide amphiphiles: From molecules to nanostructures to biomaterials , 2010, Biopolymers.

[6]  Q. Luo,et al.  Protein Assembly: Versatile Approaches to Construct Highly Ordered Nanostructures. , 2016, Chemical reviews.

[7]  C. Parmenter,et al.  Tunable Pentapeptide Self‐Assembled β‐Sheet Hydrogels , 2018, Angewandte Chemie.

[8]  Daniela Kalafatovic,et al.  Exploring the sequence space for (tri-)peptide self-assembly to design and discover new hydrogels. , 2015, Nature chemistry.

[9]  Rein V. Ulijn,et al.  Virtual Screening for Dipeptide Aggregation: Toward Predictive Tools for Peptide Self-Assembly , 2011, The journal of physical chemistry letters.

[10]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Olesia V. Moroz,et al.  Short peptides self-assemble to produce catalytic amyloids , 2014, Nature chemistry.

[13]  Berk Hess,et al.  P-LINCS:  A Parallel Linear Constraint Solver for Molecular Simulation. , 2008, Journal of chemical theory and computation.

[14]  M. Raghunath,et al.  The Collagen Suprafamily: From Biosynthesis to Advanced Biomaterial Development , 2018, Advanced materials.

[15]  A. Mark,et al.  Coarse grained model for semiquantitative lipid simulations , 2004 .

[16]  J. Hogden,et al.  Statistical inference and adaptive design for materials discovery , 2017 .

[17]  H. Berendsen,et al.  Molecular dynamics with coupling to an external bath , 1984 .

[18]  D. Tieleman,et al.  Using the Wimley-White Hydrophobicity Scale as a Direct Quantitative Test of Force Fields: The MARTINI Coarse-Grained Model. , 2011, Journal of chemical theory and computation.

[19]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[20]  Xuehai Yan,et al.  Self-assembly and application of diphenylalanine-based nanostructures. , 2010, Chemical Society reviews.

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  A. Kelarakis,et al.  Self-assembly and hydrogelation of an amyloid peptide fragment. , 2008, Biochemistry.

[23]  Rampi Ramprasad,et al.  Screening of Therapeutic Agents for COVID-19 Using Machine Learning and Ensemble Docking Studies , 2020, The journal of physical chemistry letters.

[24]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[25]  D Vanderspoel,et al.  GROMACS - A PARALLEL COMPUTER FOR MOLECULAR-DYNAMICS SIMULATIONS , 1993 .

[26]  Andrew M. Smith,et al.  Designing peptide based nanomaterials. , 2008, Chemical Society reviews.

[27]  Gobbi,et al.  Genetic optimization of combinatorial libraries , 1998, Biotechnology and bioengineering.

[28]  Mischa Zelzer,et al.  Next-generation peptide nanomaterials: molecular networks, interfaces and supramolecular functionality. , 2010, Chemical Society reviews.

[29]  T. Creamer,et al.  Solvation energies of amino acid side chains and backbone in a family of host-guest pentapeptides. , 1996, Biochemistry.

[30]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[31]  J. Lu,et al.  Molecular self-assembly and applications of designer peptide amphiphiles. , 2010, Chemical Society reviews.

[32]  S H White,et al.  Hydrophobic interactions of peptides with membrane interfaces. , 1998, Biochimica et biophysica acta.

[33]  Yimin A. Wu,et al.  Tailorable Exciton Transport in Doped Peptide-Amphiphile Assemblies. , 2017, ACS nano.

[34]  Samuel I Stupp,et al.  Molecular self-assembly into one-dimensional nanostructures. , 2008, Accounts of chemical research.