Lasagna: Multifaceted Protein-Protein Interaction Prediction Based on Siamese Residual RCNN

Sequence-based protein-protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information. Hence, we present an end-to-end framework, Lasagna, for PPI predictions using only the primary sequences of a protein pair. Lasagna incorporates a deep residual recurrent convolutional neural network in the Siamese learning architecture, which leverages both robust local features and contextualized information that are significant for capturing the mutual influence of protein sequences. Our framework relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Experimental evaluations show that Lasagna outperforms various state-of-the-art systems on the binary PPI prediction problem. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short. The implementation of our framework is available at https://github.com/muhaochen/seq_ppi.git

[1]  Jan Tavernier,et al.  Modulation of Protein–Protein Interactions for the Development of Novel Therapeutics , 2015, Molecular therapy : the journal of the American Society of Gene Therapy.

[2]  Ruslan Salakhutdinov,et al.  Gated-Attention Readers for Text Comprehension , 2016, ACL.

[3]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[4]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[5]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[6]  Xing Chen,et al.  Predicting protein-protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. , 2017, Molecular bioSystems.

[7]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[8]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[9]  Jie Yu,et al.  Attention-Based Natural Language Person Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[11]  Shinn-Ying Ho,et al.  Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes , 2015, BMC Bioinformatics.

[12]  Jianyang Zeng,et al.  A deep learning framework for modeling structural features of RNA-binding protein targets , 2015, Nucleic acids research.

[13]  Zhen Ji,et al.  Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set , 2014, BMC Bioinformatics.

[14]  Jun'ichi Tsujii,et al.  A Rich Feature Vector for Protein-Protein Interaction Extraction from Multiple Corpora , 2009, EMNLP.

[15]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16]  Zhu-Hong You,et al.  Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence , 2015, BioMed research international.

[17]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[18]  Hongbo Zhu,et al.  NOXclass: prediction of protein-protein interaction types , 2006, BMC Bioinformatics.

[19]  Yu Yao,et al.  DeepPPI: Boosting Prediction of Protein-Protein Interactions with Deep Neural Networks , 2017, J. Chem. Inf. Model..

[20]  Bowen Zhou,et al.  ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[21]  Jimmy J. Lin,et al.  Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks , 2015, EMNLP.

[22]  Wei Wang,et al.  Learning to Disentangle Interleaved Conversational Threads with a Siamese Hierarchical Network and Similarity Ranking , 2018, NAACL.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[25]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[26]  Juan Fernández-Recio,et al.  SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models , 2012, Bioinform..

[27]  Hong-Bin Shen,et al.  Predicting RNA‐protein binding sites and motifs through combining local and global deep convolutional neural networks , 2018, Bioinform..

[28]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[29]  M. Michael Gromiha,et al.  Protein-protein binding affinity prediction from amino acid sequence , 2014, Bioinform..

[30]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[31]  Alex W. Wilkinson,et al.  Computational prediction of protein-protein interactions , 2012 .

[32]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[33]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[34]  Shao-Wu Zhang,et al.  Prediction of Protein–Protein Interaction with Pairwise Kernel Support Vector Machine , 2014, International journal of molecular sciences.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Chris Anderson Google's AI Tool DeepVariant Promises Significantly Fewer Genome Errors , 2018 .

[37]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[38]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[39]  Hongbin Shen,et al.  Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features. , 2010, Journal of proteome research.

[40]  Wenpeng Yin,et al.  Convolutional Neural Network for Paraphrase Identification , 2015, NAACL.

[41]  Jie Gui,et al.  Prediction of protein-protein interactions from protein sequence using local descriptors. , 2010, Protein and peptide letters.

[42]  Ning Chen,et al.  Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding , 2017, Bioinform..

[43]  Zhu-Hong You,et al.  Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest , 2015, PloS one.

[44]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[45]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[46]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[47]  Carlo Zaniolo,et al.  Neural Article Pair Modeling for Wikipedia Sub-article Matching , 2018, ECML/PKDD.

[48]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[49]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[50]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[51]  Yu Zhang,et al.  Very deep convolutional networks for end-to-end speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Shuai Li,et al.  Detection of Protein-Protein Interactions from Amino Acid Sequences Using a Rotation Forest Model with a Novel PR-LPQ Descriptor , 2015, ICIC.

[53]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Behnam Neyshabur,et al.  Predicting protein‐protein interactions through sequence‐based deep learning , 2018, Bioinform..

[55]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[56]  Dmitrij Frishman,et al.  Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis , 2013, Nucleic Acids Res..

[57]  Luhua Lai,et al.  Sequence-based prediction of protein protein interaction using a deep-learning algorithm , 2017, BMC Bioinformatics.

[58]  Zhu-Hong You,et al.  Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis , 2013, BMC Bioinformatics.

[59]  R. Sharan,et al.  A Method for Predicting Protein-Protein Interaction Types , 2014, PloS one.

[60]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[61]  Zhifang Sui,et al.  Reading and Thinking: Re-read LSTM Unit for Textual Entailment Recognition , 2016, COLING.

[62]  Gisbert Schneider,et al.  Recurrent Neural Network Model for Constructive Peptide Design , 2018, J. Chem. Inf. Model..

[63]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.