PHAT: interpretable prediction of peptide secondary structures

19 Peptides have recently emerged as therapeutic molecules against various diseases, and the 20 secondary structure of peptides is a crucial determinant of their bioactivity. However, 21 accurately predicting peptide secondary structures remains a challenging task due to the lack 22 of peptide sequence data and low prediction efficiency caused by limitations in feature 23 engineering. Therefore, we developed PHAT, a deep learning framework based on a 24 hypergraph multi-head attention network and transfer learning for the prediction of peptide 25 secondary structures. Comparative results demonstrated the outstanding performance and 26 robustness of PHAT. In particular, PHAT automatically learns a set of biologically meaningful 27 knowledge on secondary sub-structures, overcoming the limitations of “black-box” in deep 28 learning-based models and providing good interpretability. Additionally, we demonstrated that 29 the structure information derived from PHAT significantly improved the performance of 30 downstream tasks such as the prediction of peptide toxicity and protein-peptide binding sites. 31 Importantly, we further explored the applicability of PHAT for contact map prediction, which 32 can aid in the reconstruction of peptide 3-D structures, thus highlighting the versatility of our 33 model. To facilitate the use of PHAT, we establish an online server which is accessible via 34 https://server.wei-group.net/PHAT/. We expect our work to assist in the design of functional 35 peptides and contribute to the advancement of structural biology research. 36 37

[1]  K. Nakai,et al.  Predicting protein-peptide binding residues via interpretable deep learning , 2022, Bioinform..

[2]  I. Anishchenko,et al.  The trRosetta server for fast and accurate protein structure prediction , 2021, Nature Protocols.

[3]  Jianyi Yang,et al.  Improved Protein Structure Prediction Using a New Multi‐Scale Network and Homologous Templates , 2021, Advanced science.

[4]  B. Peters,et al.  NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and β sequence data , 2021, Communications Biology.

[5]  Yu Wang,et al.  Learning embedding features based on multisense-scaled attention architecture to improve the predictive performance of anticancer peptides , 2021, Bioinform..

[6]  Leyi Wei,et al.  ATSE: a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural network and attention mechanism , 2021, Briefings Bioinform..

[7]  Abu Sayed Chowdhury,et al.  Better understanding and prediction of antiviral peptides through primary and secondary structure feature importance , 2020, Scientific Reports.

[8]  Huan Liu,et al.  Be More with Less: Hypergraph Attention Networks for Inductive Text Classification , 2020, EMNLP.

[9]  Q. Kong,et al.  Antimicrobial Peptides: Classification, Design, Application and Research Progress in Multiple Fields , 2020, Frontiers in Microbiology.

[10]  B. Rost,et al.  ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing , 2020, bioRxiv.

[11]  Jianyi Yang,et al.  Improved protein structure prediction using predicted interresidue orientations , 2019, Proceedings of the National Academy of Sciences.

[12]  Peter J. Liu,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[13]  Zhaoyu Li,et al.  MUFold-SSW: a new web server for predicting protein secondary structures, torsion angles and turns , 2019, Bioinform..

[14]  Gianluca Pollastri,et al.  Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction , 2019, Scientific Reports.

[15]  J. Poyet,et al.  Recent Advances in Cell Penetrating Peptide-Based Anticancer Therapies , 2019, Molecules.

[16]  Harinder Singh,et al.  Peptide Secondary Structure Prediction using Evolutionary Information , 2019, bioRxiv.

[17]  Chao Fang,et al.  MUFOLD‐SS: New deep inception‐inside‐inception networks for protein secondary structure prediction , 2018, Proteins.

[18]  Kuldip K. Paliwal,et al.  Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..

[19]  C. Heinis,et al.  Cyclic peptide therapeutics: past, present and future. , 2017, Current opinion in chemical biology.

[20]  Navdeep Jaitly,et al.  Next-Step Conditioned Deep Convolutional Neural Networks Improve Protein Secondary Structure Prediction , 2017, ArXiv.

[21]  Scott J. Miller,et al.  Diversity of Secondary Structure in Catalytic Peptides with β-Turn-Biased Sequences , 2016, Journal of the American Chemical Society.

[22]  Wei Li,et al.  RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..

[23]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[24]  P. Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[25]  Apurba K. Das,et al.  Self-programmed nanovesicle to nanofiber transformation of a dipeptide appended bolaamphiphile and its dose dependent cytotoxic behaviour. , 2014, Journal of materials chemistry. B.

[26]  Pierre Tufféry,et al.  PEP-FOLD: an online resource for de novo peptide structure prediction , 2009, Nucleic Acids Res..

[27]  D. Flower,et al.  Peptide length significantly influences in vitro affinity for MHC class II molecules , 2008, Immunome research.

[28]  David S. Wishart,et al.  PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation , 2008, Nucleic Acids Res..

[29]  Christian Cole,et al.  The Jpred 3 secondary structure prediction server , 2008, Nucleic Acids Res..

[30]  J. Schmidhuber,et al.  2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005 .

[31]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[32]  J. Skolnick,et al.  MONSSTER: a method for folding globular proteins with a small number of distance restraints. , 1997, Journal of molecular biology.

[33]  P. Argos,et al.  Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. , 1996, Protein engineering.

[34]  W. Taylor,et al.  Global fold determination from a small number of distance restraints. , 1995, Journal of molecular biology.

[35]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[36]  K. B. Ward,et al.  Quaternary and tertiary structure of haemerythrin , 1975, Nature.

[37]  A. Sommerfeld,et al.  Viterbi Algorithm , 2010, Encyclopedia of Machine Learning.

[38]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[39]  Jens Meiler,et al.  Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation , 2003, Proteins.

[40]  H. Tanii,et al.  Structure-toxicity relationship of acrylates and methacrylates. , 1982, Toxicology letters.