A general instance representation architecture for protein-protein interaction extraction

Previous researches have shown that supervised Protein-Protein Interaction Extraction (PPIE) can get high accuracies with elaborately selected features and kernels. However, most features and kernels rest upon domain knowledge and natural language analysis, which makes the supervised model expensive, heavy and brittle. Moreover, the one-hot encoding, a commonly used representation technique, fails to capture the semantic similarity between words. To reduce the manual labor and overcome the shortage of one-hot encoding, we put forward a general instance representation architecture for PPIE, which integrates word representation and vector composition. Our method obtains F-scores of 69.4%, 78.8%, 76.0%, 74.0% and 81.1% on AIMed, BioInfer, HPRD50, IEPA and LLL respectively.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Ulf Leser,et al.  A detailed error analysis of 13 kernel methods for protein–protein interaction extraction , 2013, BMC Bioinformatics.

[3]  Jihoon Yang,et al.  Walk-weighted subsequence kernels for protein-protein interaction extraction , 2010, BMC Bioinformatics.

[4]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[5]  Jun'ichi Tsujii,et al.  A Rich Feature Vector for Protein-Protein Interaction Extraction from Multiple Corpora , 2009, EMNLP.

[6]  Xiao Zhang,et al.  Multiple kernel learning in protein-protein interaction extraction from biomedical literature , 2011, Artif. Intell. Medicine.

[7]  Daniel Berleant,et al.  Mining MEDLINE: Abstracts, Sentences, or Phrases? , 2001, Pacific Symposium on Biocomputing.

[8]  Jun'ichi Tsujii,et al.  Protein-protein interaction extraction by leveraging multiple kernels and parsers , 2009, Int. J. Medical Informatics.

[9]  Sung-Hyon Myaeng,et al.  Simplicity is Better: Revisiting Single Kernel PPI Extraction , 2010, COLING.

[10]  Claire Nédellec,et al.  Learning Language in Logic - Genic Interaction Extraction Challenge , 2005 .

[11]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[12]  Xiaohua Hu,et al.  Learning an enriched representation from unlabeled data for protein-protein interaction extraction , 2010, BMC Bioinformatics.

[13]  Jari Björne,et al.  A Graph Kernel for Protein-Protein Interaction Extraction , 2008, BioNLP.

[14]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..