Sequence-based peptide identification, generation, and property prediction with deep learning: a review

This article reviews recent work that uses deep learning algorithms to identify and generate functional peptides as well as predict their biological properties.

[1]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[2]  Julia G. Bodmer,et al.  IMGT/HLA Database--a sequence database for the human major histocompatibility complex. , 2001, Nucleic acids research.

[3]  Peng Qiu,et al.  Classification of Antibacterial Peptides Using Long Short-Term Memory Recurrent Neural Networks , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Stefan Wuchty,et al.  Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method , 2019, Computational and structural biotechnology journal.

[5]  Kevin K. Yang,et al.  Machine-learning-guided directed evolution for protein engineering , 2018, Nature Methods.

[6]  Hai-Cheng Yi,et al.  ACP-DL: A Deep Learning Long Short-Term Memory Model to Predict Anticancer Peptides Using High-Efficiency Feature Representation , 2019, Molecular therapy. Nucleic acids.

[7]  Brendan J. Frey,et al.  Generating and designing DNA with deep generative models , 2017, ArXiv.

[8]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[9]  G. Vonheijne The signal peptide. , 1990 .

[10]  Zachary Wu,et al.  Learned protein embeddings for machine learning , 2018, Bioinformatics.

[11]  K. Chou,et al.  iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. , 2013, Analytical biochemistry.

[12]  Han Zhang,et al.  Antimicrobial peptide identification using multi-scale convolutional network , 2019, BMC Bioinformatics.

[13]  Zhangxin Chen,et al.  ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network , 2017, Molecules.

[14]  William F Porto,et al.  Antimicrobial activity predictors benchmarking analysis using shuffled and designed synthetic peptides. , 2017, Journal of theoretical biology.

[15]  N. Chandra,et al.  Computational antimicrobial peptide design and evaluation against multidrug-resistant clinical isolates of bacteria , 2017, The Journal of Biological Chemistry.

[16]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[17]  Emmanuel L. C. de los Santos NeuRiPP: Neural network identification of RiPP precursor peptides , 2019, Scientific Reports.

[18]  Dinesh Gupta,et al.  VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens , 2008, BMC Bioinformatics.

[19]  Tobias Meisen,et al.  Ablation Studies in Artificial Neural Networks , 2019, ArXiv.

[20]  Xiaohui Xie,et al.  HLA class I binding prediction via convolutional neural networks , 2017, bioRxiv.

[21]  Taghi M. Khoshgoftaar,et al.  A survey on addressing high-class imbalance in big data , 2018, Journal of Big Data.

[22]  Jianxin Wu,et al.  Minimal gated unit for recurrent neural networks , 2016, International Journal of Automation and Computing.

[23]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Tao Jiang,et al.  DeepHINT: understanding HIV-1 integration via deep learning with attention , 2019, Bioinform..

[26]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[27]  Xiangrong Liu,et al.  An advanced approach to identify antimicrobial peptides and their function types for penaeus through machine learning strategies , 2019, BMC Bioinformatics.

[28]  Yuwei Zhang,et al.  In silico design of MHC class I high binding affinity peptides through motifs activation map , 2018, BMC Bioinformatics.

[29]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[30]  Nadir Durrani,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, 4-9 August 2013, Sofia, Bulgaria, Volume 1: Long Papers , 2013, Annual Meeting of the Association for Computational Linguistics.

[31]  Nithin V. George,et al.  KELM-CPPpred: Kernel Extreme Learning Machine Based Prediction Model for Cell-Penetrating Peptides. , 2018, Journal of proteome research.

[32]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[33]  Brendan J. Frey,et al.  Deep learning of the tissue-regulated splicing code , 2014, Bioinform..

[34]  Rachel Karchin,et al.  Prediction of peptide binding to MHC Class I proteins in the age of deep learning , 2017 .

[35]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[36]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yuxin Cui,et al.  DeepSeqPan, a novel deep convolutional neural network model for pan-specific class I HLA-peptide binding affinity prediction , 2018, Scientific Reports.

[38]  Carlo Mazzaferro Predicting Protein Binding Affinity With Word Embeddings and Recurrent Neural Networks , 2017, bioRxiv.

[39]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[40]  Pablo Carbonell,et al.  Opportunities at the Intersection of Synthetic Biology, Machine Learning, and Automation. , 2019, ACS synthetic biology.

[41]  Tianyi Zhao,et al.  Peptide-Major Histocompatibility Complex Class I Binding Prediction Based on Deep Learning With Novel Feature , 2019, Front. Genet..

[42]  Gajendra P. S. Raghava,et al.  AntiBP2: improved version of antibacterial peptide prediction , 2010, BMC Bioinformatics.

[43]  Jaime Lloret,et al.  Conditional Variational Autoencoder for Prediction and Feature Recovery Applied to Intrusion Detection in IoT , 2017, Sensors.

[44]  Xiaoxia Wang,et al.  ACME: pan-specific peptide-MHC class I binding prediction through attention-based deep neural networks , 2019, Bioinform..

[45]  Balachandran Manavalan,et al.  Machine intelligence in peptide therapeutics: A next‐generation tool for rapid disease screening , 2020, Medicinal research reviews.

[46]  Faiza Hanif Waghu,et al.  CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides , 2015, Nucleic Acids Res..

[47]  George M. Church,et al.  Unified rational protein engineering with sequence-based deep representation learning , 2019, Nature Methods.

[48]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Generating Ampicillin-Level Antimicrobial Peptides with Activity-Aware Generative Adversarial Networks , 2020, ACS omega.

[50]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[51]  Jijun Tang,et al.  DeepAVP: A Dual-Channel Deep Neural Network for Identifying Variable-Length Antiviral Peptides , 2020, IEEE Journal of Biomedical and Health Informatics.

[52]  Alireza Nasiri,et al.  Attention mechanism-based deep learning pan-specific model for interpretable MHC-I peptide binding prediction , 2019, bioRxiv.

[53]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[54]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[55]  O. Stegle,et al.  DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning , 2016, Genome Biology.

[56]  Scott A. Walper,et al.  Variational Autoencoder for Generation of Antimicrobial Peptides , 2020, ACS omega.

[57]  M. Mahlapuu,et al.  Antimicrobial Peptides: An Emerging Category of Therapeutic Agents , 2016, Front. Cell. Infect. Microbiol..

[58]  Darby Tien-Hao Chang,et al.  SigUNet: signal peptide recognition based on semantic segmentation , 2019, BMC Bioinformatics.

[59]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[60]  Haoyang Zeng,et al.  DeepLigand: accurate prediction of MHC class I ligands using peptide embedding , 2019, Bioinform..

[61]  Xiujun Gong,et al.  Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences , 2018, Molecules.

[62]  Zachary C. Lipton,et al.  The mythos of model interpretability , 2018, Commun. ACM.

[63]  Michael K Gilson,et al.  Discovering de novo peptide substrates for enzymes using machine learning , 2018, Nature Communications.

[64]  Dong Xu,et al.  Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types , 2016, Bioinform..

[65]  Yoshitaka Moriwaki,et al.  Prediction of Antifungal Peptides by Deep Learning with Character Embedding , 2019, IPSJ Transactions on Bioinformatics.

[66]  G. King,et al.  Trends in peptide drug discovery , 2021, Nature Reviews Drug Discovery.

[67]  Amarda Shehu,et al.  Deep learning improves antimicrobial peptide recognition , 2018, Bioinform..

[68]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[69]  Rainier Barrett,et al.  Classifying antimicrobial and multifunctional peptides with Bayesian network models , 2018, Peptide Science.

[70]  Myle Ott,et al.  Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.

[71]  Burkhard Rost,et al.  Modeling aspects of the language of life through transfer-learning protein sequences , 2019, BMC Bioinformatics.

[72]  Yu Li,et al.  mlDEEPre: Multi-Functional Enzyme Function Prediction With Hierarchical Multi-Label Deep Learning , 2019, Front. Genet..

[73]  De-Shuang Huang,et al.  Recurrent Neural Network for Predicting Transcription Factor Binding Sites , 2018, Scientific Reports.

[74]  Mostafa Karimi,et al.  De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks , 2020, J. Chem. Inf. Model..

[75]  Deepak Kolippakkam,et al.  APDbase: Amino acid Physicochemical properties Database , 2005, Bioinformation.

[76]  Ehsaneddin Asgari,et al.  ProtVec: A Continuous Distributed Representation of Biological Sequences , 2015, ArXiv.

[77]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[78]  M. Nielsen,et al.  NetTCR: sequence-based prediction of TCR binding to peptide-MHC complexes using convolutional neural networks , 2018, bioRxiv.

[79]  Iddo Friedberg,et al.  Identifying antimicrobial peptides using word embedding with deep recurrent neural networks , 2018, bioRxiv.

[80]  O. Winther,et al.  Detecting sequence signals in targeting peptides using deep learning , 2019, Life Science Alliance.

[81]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[82]  Kyungsook Han,et al.  A Deep Learning Model for RNA-Protein Binding Preference Prediction Based on Hierarchical LSTM and Attention Network , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[83]  Xiaoying Wang,et al.  Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique , 2018, Bioinform..

[84]  Abdollah Dehzangi,et al.  Predicting protein-peptide binding sites with a Deep Convolutional Neural Network. , 2020, Journal of theoretical biology.

[85]  Andre Franke,et al.  Amino acid encoding for deep learning applications , 2020, BMC Bioinformatics.

[86]  Eugene Lin,et al.  Relevant Applications of Generative Adversarial Networks in Drug Design and Discovery: Molecular De Novo Design, Dimensionality Reduction, and De Novo Peptide and Protein Design , 2020, Molecules.

[87]  Haoyang Zeng,et al.  Quantification of Uncertainty in Peptide-MHC Binding Prediction Improves High-Affinity Peptide Selection for Therapeutic Design. , 2019, Cell systems.

[88]  Jun Cheng,et al.  BERTMHC: Improves MHC-peptide class II interaction prediction with transformer and multiple instance learning , 2020 .

[89]  Kumardeep Chaudhary,et al.  Computer-Aided Virtual Screening and Designing of Cell-Penetrating Peptides. , 2015, Methods in molecular biology.

[90]  Dongsup Kim,et al.  Deep convolutional neural networks for pan-specific peptide-MHC class I binding prediction , 2017, BMC Bioinformatics.

[91]  Georg Seelig,et al.  A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation , 2019, Cell.

[92]  Gwang Lee,et al.  AIPpred: Sequence-Based Prediction of Anti-inflammatory Peptides Using Random Forest , 2018, Front. Pharmacol..

[93]  Eleazar Eskin,et al.  Annotating Gene Ontology terms for protein sequences with the Transformer model , 2020, bioRxiv.

[94]  Rui Gao,et al.  PTPD: predicting therapeutic peptides by deep learning and word2vec , 2019, BMC Bioinformatics.

[95]  Yong Yu,et al.  A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures , 2019, Neural Computation.

[96]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[97]  Pritish Kumar Varadwaj,et al.  DeepInteract: Deep Neural Network Based Protein-Protein Interaction Prediction Tool , 2017 .

[98]  K. Lam,et al.  A new type of synthetic peptide library for identifying ligand-binding activity , 1992, Nature.

[99]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[100]  Natapol Pornputtapong,et al.  MHCSeqNet: a deep neural network model for universal MHC binding prediction , 2018, BMC Bioinformatics.

[101]  Sergey Nikolenko,et al.  druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico. , 2017, Molecular pharmaceutics.

[102]  Jianjun Hu,et al.  DeepMHC: Deep Convolutional Neural Networks for High-performance peptide-MHC Binding Affinity Prediction , 2017, bioRxiv.

[103]  Ruqian Lu,et al.  Amino Acid Encoding Methods for Protein Sequences: A Comprehensive Review and Assessment , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[104]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[105]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[106]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[107]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[108]  Yu Li,et al.  Deep learning in bioinformatics: Introduction, application, and perspective in the big data era. , 2019, Methods.

[109]  R. Houghten,et al.  Generation and use of synthetic peptide combinatorial libraries for basic research and drug discovery , 1991, Nature.

[110]  Yi Li,et al.  Gene expression inference with deep learning , 2015, bioRxiv.

[111]  Alessandro Sette,et al.  The Immune Epitope Database (IEDB): 2018 update , 2018, Nucleic Acids Res..

[112]  Devin P Sullivan,et al.  Active machine learning-driven experimentation to determine compound effects on protein patterns , 2016, eLife.

[113]  D. Hilvert,et al.  Protein design by directed evolution. , 2008, Annual review of biophysics.

[114]  Morten Nielsen,et al.  Immunoinformatics: Predicting Peptide-MHC Binding. , 2020, Annual review of biomedical data science.

[115]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[116]  Alireza Nasiri,et al.  Attention mechanism-based deep learning pan-specific model for interpretable MHC-I peptide binding prediction , 2019, bioRxiv.

[117]  Alex Rubinsteyn,et al.  MHCflurry: Open-Source Class I MHC Binding Affinity Prediction. , 2018, Cell systems.

[118]  Piero Fariselli,et al.  DeepSig: deep learning improves signal peptide detection in proteins , 2017, Bioinform..

[119]  Konstantinos D. Tsirigos,et al.  SignalP 5.0 improves signal peptide predictions using deep neural networks , 2019, Nature Biotechnology.

[120]  Gisbert Schneider,et al.  Recurrent Neural Network Model for Constructive Peptide Design , 2018, J. Chem. Inf. Model..

[121]  William Stafford Noble,et al.  Empirical comparison of web‐based antimicrobial peptide prediction tools , 2017, Bioinform..

[122]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[123]  Morten Nielsen,et al.  An automated benchmarking platform for MHC class II binding prediction methods , 2018, Bioinform..

[124]  Frank Keller,et al.  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL , 2014, EMNLP.

[125]  John-William Sidhom,et al.  AI-MHC: an allele-integrated deep learning framework for improving Class I & Class II HLA-binding predictions , 2018, bioRxiv.

[126]  Andrew L. Ferguson,et al.  Mapping membrane activity in undiscovered peptide sequence space using machine learning , 2016, Proceedings of the National Academy of Sciences.

[127]  Jianyang Zeng,et al.  Analysis of Ribosome Stalling and Translation Elongation Dynamics by Deep Learning. , 2017, Cell systems.

[128]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[129]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[130]  Morten Nielsen,et al.  Automated benchmarking of peptide-MHC class I binding predictions , 2015, Bioinform..

[131]  Ole Winther,et al.  DeepLoc: prediction of protein subcellular localization using deep learning , 2017, Bioinform..

[132]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[133]  Hector Garcia Martin,et al.  A machine learning Automated Recommendation Tool for synthetic biology , 2019, Nature Communications.

[134]  Russ B. Altman,et al.  Predicting HLA class II antigen presentation through integrated deep learning , 2019, Nature Biotechnology.

[135]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[136]  D. Baker,et al.  De novo design of picomolar SARS-CoV-2 miniprotein inhibitors , 2020, Science.

[137]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[138]  Xia Li,et al.  APD3: the antimicrobial peptide database as a tool for research and education , 2015, Nucleic Acids Res..

[139]  Gisbert Schneider,et al.  Designing Anticancer Peptides by Constructive Machine Learning , 2018, ChemMedChem.