Nearly-Unsupervised Hashcode Representations for Relation Extraction

Recently, kernelized locality sensitive hashcodes have been successfully employed as representations of natural language text, especially showing high relevance to biomedical relation extraction tasks. In this paper, we propose to optimize the hashcode representations in a nearly unsupervised manner, in which we only use data points, but not their class labels, for learning. The optimized hashcode representations are then fed to a supervised classifier following the prior work. This nearly unsupervised approach allows fine-grained optimization of each hash function, which is particularly suitable for building hashcode representations generalizing from a training set to a test set. We empirically evaluate the proposed approach for biomedical relation extraction tasks, obtaining significant accuracy improvements w.r.t. state-of-the-art supervised and semi-supervised approaches.

[1]  Daniel Marcu,et al.  Biomedical Event Extraction using Abstract Meaning Representation , 2017, BioNLP.

[2]  Daniel Marcu,et al.  Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation , 2015, EMNLP.

[3]  Zhiyong Lu,et al.  Generalizing biomedical relation classification with neural adversarial domain adaptation , 2018, Bioinform..

[4]  Sampo Pyysalo,et al.  Overview of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[5]  Tung Tran,et al.  Extracting Drug-Drug Interactions with Word and Character-Level Recurrent Neural Networks , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[6]  Yung-Chun Chang,et al.  Identifying Protein-protein Interactions in Biomedical Literature using Recurrent Neural Networks with Long Short-Term Memory , 2017, IJCNLP.

[7]  Yung-Chun Chang,et al.  PIPE: a protein–protein interaction passage extraction module for BioCreative challenge , 2016, Database J. Biol. Databases Curation.

[8]  Zhiyong Lu,et al.  Exploring Semi-supervised Variational Autoencoders for Biomedical Relation Extraction , 2019, Methods.

[9]  Paul R Cohen,et al.  DARPA's Big Mechanism program , 2015, Physical biology.

[10]  Yifan Peng,et al.  Deep learning for extracting protein-protein interactions from biomedical literature , 2017, BioNLP.

[11]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[12]  Ulf Leser,et al.  A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature , 2010, PLoS Comput. Biol..

[13]  Ralph Grishman,et al.  Relation Extraction: Perspective from Convolutional Neural Networks , 2015, VS@HLT-NAACL.

[14]  Andrey Rzhetsky,et al.  The Big Mechanism Program: Changing How Science Is Done , 2016, DAMDID/RCDL.

[15]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[16]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[17]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[18]  Mihai Surdeanu,et al.  A Domain-independent Rule-based Framework for Event Extraction , 2015, ACL.

[19]  Olivier Buisson,et al.  Random maximum margin hashing , 2011, CVPR 2011.

[20]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[21]  Daniel Marcu,et al.  Extracting Biomolecular Interactions Using Semantic Parsing of Biomedical Text , 2015, AAAI.

[22]  Sahil Garg,et al.  Kernelized Hashcode Representations for Relation Extraction , 2019, AAAI.

[23]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[24]  Jun'ichi Tsujii,et al.  Protein-protein interaction extraction by leveraging multiple kernels and parsers , 2009, Int. J. Medical Informatics.

[25]  Sahil Garg,et al.  Stochastic Learning of Nonstationary Kernels for Natural Language Modeling , 2018, ArXiv.

[26]  Anita Alicante,et al.  A distributed architecture to integrate ontological knowledge into information extraction , 2016, Int. J. Grid Util. Comput..