Kernelized Hashcode Representations for Biomedical Relation Extraction

Kernel methods have produced state-of-the-art results for a number of NLP tasks such as relation extraction, but suffer from poor scalability due to the high cost of computing kernel similarities between discrete natural language structures. A recently proposed technique, kernelized locality-sensitive hashing (KLSH), can significantly reduce the computational cost, but is only applicable to classifiers operating on kNN graphs. Here we propose to use random subspaces of KLSH codes for efficiently constructing an explicit representation of NLP structures suitable for general classification methods. Further, we propose an approach for optimizing the KLSH model for classification problems by maximizing a variational lower bound on mutual information between the KLSH codes (feature vectors) and the class labels. We evaluate the proposed approach on biomedical relation extraction datasets, and observe significant and robust improvements in accuracy w.r.t. state-of-the-art classifiers, along with drastic (orders-of-magnitude) speedup compared to conventional kernel methods.

[1]  Zhiyong Lu,et al.  Generalizing biomedical relation classification with neural adversarial domain adaptation , 2018, Bioinform..

[2]  Le Song,et al.  Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[3]  Yung-Chun Chang,et al.  Identifying Protein-protein Interactions in Biomedical Literature using Recurrent Neural Networks with Long Short-Term Memory , 2017, IJCNLP.

[4]  Xing Shi,et al.  Speeding Up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary , 2017, ACL.

[5]  Yifan Peng,et al.  Deep learning for extracting protein-protein interactions from biomedical literature , 2017, BioNLP.

[6]  Yung-Chun Chang,et al.  PIPE: a protein–protein interaction passage extraction module for BioCreative challenge , 2016, Database J. Biol. Databases Curation.

[7]  Shiguang Shan,et al.  Deep Supervised Hashing for Fast Image Retrieval , 2016, International Journal of Computer Vision.

[8]  Aram Galstyan,et al.  Variational Information Maximization for Feature Selection , 2016, NIPS.

[9]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Frank Hutter,et al.  CMA-ES for Hyperparameter Optimization of Deep Neural Networks , 2016, ArXiv.

[11]  Daniel Marcu,et al.  Extracting Biomolecular Interactions Using Semantic Parsing of Biomedical Text , 2015, AAAI.

[12]  Miles Osborne,et al.  Twitter-scale New Event Detection via K-term Hashing , 2015, EMNLP.

[13]  Daniel Marcu,et al.  Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation , 2015, EMNLP.

[14]  Lucia Specia,et al.  Learning Structural Kernels for Natural Language Processing , 2015, TACL.

[15]  Paul R Cohen,et al.  DARPA's Big Mechanism program , 2015, Physical biology.

[16]  Mihai Surdeanu,et al.  A Domain-independent Rule-based Framework for Event Extraction , 2015, ACL.

[17]  Jiwen Lu,et al.  Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[19]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[20]  A. Galstyan,et al.  Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.

[21]  Ming-Hsuan Yang,et al.  Locality preserving hashing , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[22]  James Bailey,et al.  Effective global approaches for mutual information based feature selection , 2014, KDD.

[23]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[24]  Gang Hua,et al.  Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Barnabás Póczos,et al.  Generalized Exponential Concentration Inequality for Renyi Divergence Estimation , 2014, ICML.

[26]  Heng Ji,et al.  Two-Stage Hashing for Fast Document Retrieval , 2014, ACL.

[27]  Dirk Hovy,et al.  A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations , 2013, EMNLP.

[28]  Alessandro Moschitti,et al.  Fast Linearization of Tree Kernels over Large-Scale Data , 2013, IJCAI.

[29]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[30]  C. Nédellec,et al.  Overview of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[31]  Alessandro Moschitti,et al.  Fast support vector machines for convolution tree kernels , 2012, Data Mining and Knowledge Discovery.

[32]  Hal Daumé,et al.  Fast Large-Scale Approximate Graph Construction for NLP , 2012, EMNLP.

[33]  Aliaksei Severyn,et al.  Fast support vector machines for convolution tree kernels , 2012, Data Mining and Knowledge Discovery.

[34]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Shih-Fu Chang,et al.  Spherical hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[38]  Guodong Zhou,et al.  Dependency-directed Tree Kernel-based Protein-Protein Interaction Extraction from Biomedical Literature , 2011, IJCNLP.

[39]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[40]  Sampo Pyysalo,et al.  Overview of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[41]  Olivier Buisson,et al.  Random maximum margin hashing , 2011, CVPR 2011.

[42]  Ping Li,et al.  Hashing Algorithms for Large-Scale Learning , 2011, NIPS.

[43]  Shuicheng Yan,et al.  Non-Metric Locality-Sensitive Hashing , 2010, AAAI.

[44]  Ulf Leser,et al.  A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature , 2010, PLoS Comput. Biol..

[45]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[46]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[47]  Jun'ichi Tsujii,et al.  Protein-protein interaction extraction by leveraging multiple kernels and parsers , 2009, Int. J. Medical Informatics.

[48]  Yan Li,et al.  Estimation of Mutual Information: A Survey , 2009, RSKT.

[49]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[50]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[51]  Jari Björne,et al.  All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning , 2008, BMC Bioinformatics.

[52]  U. Leser,et al.  Gene mention normalization and interaction extraction with context models and sentence motifs , 2008, Genome Biology.

[53]  A. Valencia,et al.  Overview of the protein-protein interaction annotation extraction task of BioCreative II , 2008, Genome Biology.

[54]  Ryan P. Adams,et al.  Gaussian process product models for nonparametric nonstationarity , 2008, ICML '08.

[55]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[56]  KrauseAndreas,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008 .

[57]  Alessandro Moschitti,et al.  Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[58]  Jun Suzuki,et al.  Sequence and Tree Kernels with Statistical Feature Mining , 2005, NIPS.

[59]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[60]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[61]  Patrick Pantel,et al.  Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering , 2005, ACL.

[62]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[63]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[64]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[65]  Carl E. Rasmussen,et al.  Warped Gaussian Processes , 2003, NIPS.

[66]  Mark J. Schervish,et al.  Nonstationary Covariance Functions for Gaussian Process Regression , 2003, NIPS.

[67]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[68]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[70]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[71]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[72]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  David Higdon,et al.  A process-convolution approach to modelling temperatures in the North Atlantic Ocean , 1998, Environmental and Ecological Statistics.

[74]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[75]  Daniel Marcu,et al.  Biomedical Event Extraction using Abstract Meaning Representation , 2017, BioNLP.

[76]  Clayton T. Morrison,et al.  Large-scale Automated Reading with Reach Discovers New Cancer Driving Mechanisms , 2017 .

[77]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[78]  Andrey Rzhetsky,et al.  The Big Mechanism Program: Changing How Science Is Done , 2016, DAMDID/RCDL.

[79]  Kristen Grauman,et al.  Learning Binary Hash Codes for Large-Scale Image Search , 2013, Machine Learning for Computer Vision.

[80]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[81]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[82]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[83]  L. Breiman Random Forests , 2001, Machine Learning.

[84]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[85]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[86]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[87]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .