Efficient Representation for Natural Language Processing via Kernelized Hashcodes

Kernel methods have been used widely in a number of tasks, but have had limited success in Natural Language Processing (NLP) due to high cost of computing kernel similarities between discrete natural language structures. A recently proposed technique, Kernelized Locality Sensitive Hashing (KLSH), can significantly reduce the computational cost, but is only applicable to classifiers operating on kNN graphs. Here we propose to use random subspaces of KLSH codes for efficiently constructing explicit representation of natural language structures suitable for general classification methods. Further, we propose an approach for optimizing a KLSH model for classification problems, by maximizing a variational lower bound on the mutual information between the KLSH codes (feature vectors) and the class labels. We apply the proposed approach to a biomedical information extraction task, and observe robust improvements in accuracy, along with significant speedup compared to conventional kernel methods.

[1]  Hady Wirawan Lauw,et al.  A Convolution Kernel Approach to Identifying Comparisons in Text , 2015, ACL.

[2]  Daniel Marcu,et al.  Biomedical Event Extraction using Abstract Meaning Representation , 2017, BioNLP.

[3]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[5]  Larry S. Davis,et al.  2009 IEEE 12th International Conference on Computer Vision (ICCV) , 2009 .

[6]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[7]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[8]  Sahil Garg,et al.  Stochastic Learning of Nonstationary Kernels for Natural Language Modeling , 2018, ArXiv.

[9]  Dan Roth,et al.  Multi-core Structural SVM Training , 2013, ECML/PKDD.

[10]  Shih-Fu Chang,et al.  Spherical hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Ulf Leser,et al.  A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature , 2010, PLoS Comput. Biol..

[13]  Ping Li,et al.  Hashing Algorithms for Large-Scale Learning , 2011, NIPS.

[14]  Olivier Marre,et al.  Relevant sparse codes with variational information bottleneck , 2016, NIPS.

[15]  Carlo Strapparava,et al.  Domain Kernels for Word Sense Disambiguation , 2005, ACL.

[16]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[17]  Lucia Specia,et al.  Learning Structural Kernels for Natural Language Processing , 2015, TACL.

[18]  ZhouGuodong,et al.  Tree kernel-based protein-protein interaction extraction from biomedical literature , 2012 .

[19]  Giovanni Maria Farinella,et al.  MACHINE LEARNING IN COMPUTER VISION , 2002 .

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[22]  Shuicheng Yan,et al.  Non-Metric Locality-Sensitive Hashing , 2010, AAAI.

[23]  Heng Ji,et al.  Two-Stage Hashing for Fast Document Retrieval , 2014, ACL.

[24]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[25]  BMC Bioinformatics , 2005 .

[26]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[27]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[28]  Peyman Milanfar,et al.  Action Recognition from One Example , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Kristen Grauman,et al.  Learning Binary Hash Codes for Large-Scale Image Search , 2013, Machine Learning for Computer Vision.

[30]  Daniel Marcu,et al.  Extracting Biomolecular Interactions Using Semantic Parsing of Biomedical Text , 2015, AAAI.

[31]  Daniel Marcu,et al.  Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation , 2015, EMNLP.

[32]  Sampo Pyysalo,et al.  Overview of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[33]  Hongtao Lu,et al.  Locality Preserving Hashing , 2014, AAAI.

[34]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[35]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[36]  Hideki Isozaki,et al.  Efficient Support Vector Classifiers for Named Entity Recognition , 2002, COLING.

[37]  Richard Johansson,et al.  Relational Features in Fine-Grained Opinion Analysis , 2013, CL.

[38]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[41]  Roberto Basili,et al.  KeLP: a Kernel-based Learning Platform for Natural Language Processing , 2015, ACL.

[42]  Mark Steedman,et al.  Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning , 2012 .

[43]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[44]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[45]  Aram Galstyan,et al.  Variational Information Maximization for Feature Selection , 2016, NIPS.

[46]  Takahiro Watanabe,et al.  Document Analysis and Recognition , 1999, Communications in Computer and Information Science.

[47]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[48]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[51]  Nick Cercone,et al.  Computational Linguistics , 1986, Communications in Computer and Information Science.

[52]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[53]  Olivier Buisson,et al.  Random maximum margin hashing , 2011, CVPR 2011.

[54]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[55]  Hal Daumé,et al.  Fast Large-Scale Approximate Graph Construction for NLP , 2012, EMNLP.

[56]  Xing Shi,et al.  Speeding Up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary , 2017, ACL.