Identifying Protein-protein Interactions in Biomedical Literature using Recurrent Neural Networks with Long Short-Term Memory

In this paper, we propose a recurrent neural network model for identifying protein-protein interactions in biomedical literature. Experiments on two largest public benchmark datasets, AIMed and BioInfer, demonstrate that our approach significantly surpasses state-of-the-art methods with relative improvements of 10% and 18%, respectively. Cross-corpus evaluation also demonstrate that the proposed model remains robust despite using different training data. These results suggest that RNN can effectively capture semantic relationships among proteins as well as generalizes over different corpora, without any feature engineering.

[1]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[2]  Hirotada Mori,et al.  From the sequence to cell modeling: comprehensive functional genomics in Escherichia coli. , 2004, Journal of biochemistry and molecular biology.

[3]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[4]  Yifan Peng,et al.  Deep learning for extracting protein-protein interactions from biomedical literature , 2017, BioNLP.

[5]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[6]  Jihoon Yang,et al.  Walk-weighted subsequence kernels for protein-protein interaction extraction , 2010, BMC Bioinformatics.

[7]  Harald Seitz,et al.  Protein – Protein Interaction , 2008 .

[8]  Sampo Pyysalo,et al.  How to Train good Word Embeddings for Biomedical NLP , 2016, BioNLP@ACL.

[9]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  Ulf Leser,et al.  A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature , 2010, PLoS Comput. Biol..

[12]  Jürgen Schmidhuber,et al.  LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Jun'ichi Tsujii,et al.  Protein-protein interaction extraction by leveraging multiple kernels and parsers , 2009, Int. J. Medical Informatics.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Zhenchao Jiang,et al.  An approach to improve kernel-based Protein-Protein Interaction extraction by learning from large-scale network data. , 2015, Methods.

[17]  Lei Hua,et al.  A Shortest Dependency Path Based Convolutional Neural Network for Protein-Protein Relation Extraction , 2016, BioMed research international.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Yung-Chun Chang,et al.  PIPE: a protein–protein interaction passage extraction module for BioCreative challenge , 2016, Database J. Biol. Databases Curation.

[21]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[22]  Byoung-Tak Zhang,et al.  A Tree Kernel-Based Method for Protein-Protein Interaction Mining from Biomedical Literature , 2006, KDLL.

[23]  ZhouGuodong,et al.  Tree kernel-based protein-protein interaction extraction from biomedical literature , 2012 .

[24]  F Arisaka [Protein-protein interaction]. , 1994, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme.

[25]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[26]  Hongfei Lin,et al.  A protein-protein interaction extraction approach based on deep neural network , 2016, Int. J. Data Min. Bioinform..

[27]  Makoto Miwa,et al.  End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures , 2016, ACL.