ATSE: a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural network and attention mechanism

MOTIVATION Peptides have recently emerged as promising therapeutic agents against various diseases. For both research and safety regulation purposes, it is of high importance to develop computational methods to accurately predict the potential toxicity of peptides within the vast number of candidate peptides. RESULTS In this study, we proposed ATSE, a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural networks and attention mechanism. More specifically, it consists of four modules: (i) a sequence processing module for converting peptide sequences to molecular graphs and evolutionary profiles, (ii) a feature extraction module designed to learn discriminative features from graph structural information and evolutionary information, (iii) an attention module employed to optimize the features and (iv) an output module determining a peptide as toxic or non-toxic, using optimized features from the attention module. CONCLUSION Comparative studies demonstrate that the proposed ATSE significantly outperforms all other competing methods. We found that structural information is complementary to the evolutionary information, effectively improving the predictive performance. Importantly, the data-driven features learned by ATSE can be interpreted and visualized, providing additional information for further analysis. Moreover, we present a user-friendly online computational platform that implements the proposed ATSE, which is now available at http://server.malab.cn/ATSE. We expect that it can be a powerful and useful tool for researchers of interest.

[1]  Bin Liu,et al.  A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods , 2019, Current Bioinformatics.

[2]  David T Jones,et al.  Recent developments in deep learning applied to protein structure prediction , 2019, Proteins.

[3]  Wei Chen,et al.  Predicting protein structural classes for low-similarity sequences by evaluating different features , 2019, Knowl. Based Syst..

[4]  T. Sakurai,et al.  Detecting Interactive Gene Groups for Single-Cell RNA-Seq Data Based on Co-Expression Network Analysis and Subgraph Learning , 2020, Cells.

[5]  Ran Su,et al.  Identifying N6-methyladenosine sites using multi-interval nucleotide pair position specificity and support vector machine , 2017, Scientific Reports.

[6]  Jun Sese,et al.  Compound‐protein interaction prediction with end‐to‐end learning of neural networks for graphs and sequences , 2018, Bioinform..

[7]  J. Devlin,et al.  Structure Based Prediction of Neoantigen Immunogenicity , 2019, Front. Immunol..

[8]  E. Timofeeva,et al.  Neuroactive Peptides as Putative Mediators of Antiepileptic Ketogenic Diets , 2014, Front. Neurol..

[9]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[10]  Grace W. Lindsay Attention in Psychology, Neuroscience, and Machine Learning , 2020, Frontiers in Computational Neuroscience.

[11]  Qiang Niu,et al.  Improving Self-interacting Proteins Prediction Accuracy Using Protein Evolutionary Information and Weighed-Extreme Learning Machine , 2019, Current Bioinformatics.

[12]  T. Hoffmann,et al.  Peptide therapeutics: current status and future directions. , 2015, Drug discovery today.

[13]  Z. Modrušan,et al.  Predicting immunogenic tumour mutations by combining mass spectrometry and exome sequencing , 2014, Nature.

[14]  E. Pirogova,et al.  Evaluation of the use of therapeutic peptides for cancer treatment , 2017, Journal of Biomedical Science.

[15]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[16]  Sankalp Jain,et al.  TpPred: A Tool for Hierarchical Prediction of Transport Proteins Using Cluster of Neural Networks and Sequence Derived Features , 2012 .

[17]  Trupti Joshi,et al.  Inductive inference of gene regulatory network using supervised and semi-supervised graph neural networks , 2020, Computational and structural biotechnology journal.

[18]  Yusuf A. Haggag Peptides as Drug Candidates: Limitations and Recent Development Perspectives , 2018, Biomedical Journal of Scientific & Technical Research.

[19]  N Benson,et al.  Reducing systems biology to practice in pharmaceutical company research; selected case studies. , 2012, Advances in experimental medicine and biology.

[20]  Gajendra PS Raghava,et al.  Identification of B-cell epitopes in an antigen for inducing specific class of antibodies , 2013, Biology Direct.

[21]  Q. Zou,et al.  SkipCPP-Pred: an improved and promising sequence-based predictor for predicting cell-penetrating peptides , 2017, BMC Genomics.

[22]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[23]  Xiangxiang Zeng,et al.  Network-based prediction of drug-target interactions using an arbitrary-order proximity embedded deep forest , 2020, Bioinform..

[24]  Wei Chen,et al.  Pro54DB: a database for experimentally verified sigma‐54 promoters , 2016, Bioinform..

[25]  Michal Linial,et al.  ClanTox: a classifier of short animal toxins , 2009, Nucleic Acids Res..

[26]  Andy Chi-Lung Lee,et al.  A Comprehensive Review on Current Advances in Peptide Drug Development and Design , 2019, International journal of molecular sciences.

[27]  Chanin Nantasenamat,et al.  iDPPIV-SCM: A sequence-based predictor for identifying and analyzing dipeptidyl peptidase IV (DPP-IV) inhibitory peptides using a scoring card method. , 2020, Journal of proteome research.

[28]  Yan Huang,et al.  RNALocate: a resource for RNA subcellular localizations , 2016, Nucleic Acids Res..

[29]  Chanin Nantasenamat,et al.  iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides , 2020, J. Chem. Inf. Model..

[30]  Jiu-Xin Tan,et al.  Identification of hormone binding proteins based on machine learning methods. , 2019, Mathematical biosciences and engineering : MBE.

[31]  Rahul Kumar,et al.  Peptide toxicity prediction. , 2015, Methods in molecular biology.

[32]  Xiangrong Liu,et al.  deepDR: a network-based deep learning approach to in silico drug repositioning , 2019, Bioinform..

[33]  Rong Chen,et al.  HBPred: a tool to identify growth hormone-binding proteins , 2018, International journal of biological sciences.

[34]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[35]  David J. Craik,et al.  ConoServer: updated content, knowledge, and discovery tools in the conopeptide database , 2011, Nucleic Acids Res..

[36]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[37]  Chanin Nantasenamat,et al.  iBitter-SCM: Identification and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides. , 2020, Genomics.

[38]  Dong Xu,et al.  scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses , 2020, Nature Communications.

[39]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[40]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[41]  Yu Yao,et al.  ConvsPPIS: Identifying Protein-protein Interaction Sites by an Ensemble Convolutional Neural Network with Feature Graph , 2020, Current Bioinformatics.

[42]  Q. Kaas,et al.  ArachnoServer: a database of protein toxins from spiders , 2009, BMC Genomics.

[43]  Xiaozhao Fang,et al.  Protein fold recognition based on multi-view modeling , 2019, Bioinform..

[44]  Etienne Weiss,et al.  Therapeutic antibodies: successes, limitations and hopes for the future , 2009, British journal of pharmacology.

[45]  Xiangxiang Zeng,et al.  Application of deep learning methods in biological networks , 2020, Briefings Bioinform..

[46]  Kumardeep Chaudhary,et al.  An in silico platform for predicting, screening and designing of antihypertensive peptides , 2015, Scientific Reports.

[47]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[48]  Fei Chen,et al.  Extraordinary metabolic stability of peptides containing α-aminoxy acids , 2011, Amino Acids.

[49]  Xiangxiang Zeng,et al.  Target identification among known drugs by deep learning from heterogeneous networks , 2020, Chemical science.

[50]  D. Craik,et al.  The Future of Peptide‐based Drugs , 2013, Chemical biology & drug design.

[51]  M. Khrestchatisky,et al.  Synthetic therapeutic peptides: science and market. , 2010, Drug discovery today.

[52]  Xiangrong Liu,et al.  Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism , 2019, Bioinform..

[53]  Bin Liu,et al.  MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks , 2019, Briefings Bioinform..

[54]  HaiXia Long,et al.  Deep Convolutional Neural Networks for Predicting Hydroxyproline in Proteins , 2017 .

[55]  Junjie Chen,et al.  Protein remote homology detection based on bidirectional long short-term memory , 2017, BMC Bioinformatics.

[56]  Bin Liu,et al.  DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks , 2019, Briefings Bioinform..

[57]  S. Ikramuddin,et al.  Effects on GLP-1, PYY, and leptin by direct stimulation of terminal ileum and cecum in humans: implications for ileal transposition. , 2014, Surgery for obesity and related diseases : official journal of the American Society for Bariatric Surgery.

[58]  Rahul Kumar,et al.  In Silico Approach for Predicting Toxicity of Peptides and Proteins , 2013, PloS one.

[59]  Avinash Sonawane,et al.  Antimicrobial peptides and proteins in mycobacterial therapy: current status and future prospects. , 2014, Tuberculosis.

[60]  L. Gentilucci,et al.  Chemical modifications designed to improve peptide stability: incorporation of non-natural amino acids, pseudo-peptide bonds, and cyclization. , 2010, Current pharmaceutical design.

[61]  Irena Roterman-Konieczna,et al.  Protein Secondary Structure Prediction: A Review of Progress and Directions , 2020, Current Bioinformatics.

[62]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[63]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[64]  Tetsuya Sakurai,et al.  Robust Similarity Measure for Spectral Clustering Based on Shared Neighbors , 2016 .

[65]  Phasit Charoenkwan,et al.  iAMY-SCM: Improved prediction and analysis of amyloid proteins using a scoring card method with propensity scores of dipeptides. , 2020, Genomics.

[66]  S Rackovsky,et al.  Sequence-, structure-, and dynamics-based comparisons of structurally homologous CheY-like proteins , 2017, Proceedings of the National Academy of Sciences.