Efficient Online Learning for Mapping Kernels on Linguistic Structures

Kernel methods are popular and effective techniques for learning on structured data, such as trees and graphs. One of their major drawbacks is the computational cost related to making a prediction on an example, which manifests in the classification phase for batch kernel methods, and especially in online learning algorithms. In this paper, we analyze how to speed up the prediction when the kernel function is an instance of the Mapping Kernels, a general framework for specifying kernels for structured data which extends the popular convolution kernel framework. We theoretically study the general model, derive various optimization strategies and show how to apply them to popular kernels for structured data. Additionally, we derive a reliable empirical evidence on semantic role labeling task, which is a natural language classification task, highly dependent on syntactic trees. The results show that our faster approach can clearly improve on standard kernel-based SVMs, which cannot run on very large datasets.

[1]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[2]  Y. Freund,et al.  Profile-based string kernels for remote homology detection and motif extraction. , 2005, Journal of bioinformatics and computational biology.

[3]  Benno Stein,et al.  The Impact of Modeling Overall Argumentation with Tree Kernels , 2017, EMNLP.

[4]  Hisashi Kashima,et al.  Kernels for Semi-Structured Data , 2002, ICML.

[5]  Tu Bao Ho,et al.  A bottom-up method for simplifying support vector solutions , 2006, IEEE Transactions on Neural Networks.

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[8]  Alessandro Sperduti,et al.  Efficient Kernel-based Learning for Trees , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[9]  Alessandro Moschitti,et al.  A Study on Convolution Kernels for Shallow Statistic Parsing , 2004, ACL.

[10]  Alessandro Moschitti,et al.  Fast and effective kernels for relational learning from texts , 2007, ICML '07.

[11]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[12]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[13]  Kilho Shin Mapping Kernels Defined Over Countably Infinite Mapping Systems and their Application , 2011, ACML 2011.

[14]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[15]  Kentaro Torisawa,et al.  Semantic Role Recognition Using Kernels on Weighted Marked Ordered Labeled Trees , 2006, CoNLL.

[16]  Guodong Zhou,et al.  A Grammar-driven Convolution Tree Kernel for Semantic Role Classification , 2007, ACL.

[17]  Tetsuji Kuboyama,et al.  A generalization of Haussler's convolution kernel: mapping kernel and its application to tree kernels , 2010 .

[18]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[19]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[20]  Klaus-Robert Müller,et al.  Approximate Tree Kernels , 2010, J. Mach. Learn. Res..

[21]  Alessandro Moschitti,et al.  Structural Representations for Learning Relations between Pairs of Texts , 2015, ACL.

[22]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[23]  Tetsuji Kuboyama,et al.  Mapping kernels for trees , 2011, ICML.

[24]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[25]  Alexander J. Smola,et al.  Fast Kernels for String and Tree Matching , 2002, NIPS.

[26]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[27]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[28]  Davide Anguita,et al.  An Algorithm for Reducing the Number of Support Vectors , 2004, WIRN.

[29]  Roberto Basili,et al.  Deep Learning in Semantic Kernel Spaces , 2017, ACL.

[30]  Alessandro Moschitti,et al.  Automatic Learning of Textual Entailments with Cross-Pair Similarities , 2006, ACL.

[31]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[32]  Kentaro Torisawa,et al.  Speeding up Training with Tree Kernels for Node Relation Labeling , 2005, HLT.