Efficient Linearization of Tree Kernel Functions

The combination of Support Vector Machines with very high dimensional kernels, such as string or tree kernels, suffers from two major drawbacks: first, the implicit representation of feature spaces does not allow us to understand which features actually triggered the generalization; second, the resulting computational burden may in some cases render unfeasible to use large data sets for training. We propose an approach based on feature space reverse engineering to tackle both problems. Our experiments with Tree Kernels on a Semantic Role Labeling data set show that the proposed approach can drastically reduce the computational footprint while yielding almost unaffected accuracy.

[1]  Daniel Marcu,et al.  NP Bracketing by Maximum Entropy Tagging and SVM Reranking , 2004, EMNLP.

[2]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[3]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[4]  Alessandro Moschitti,et al.  Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[5]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[6]  Roberto Basili,et al.  Semantic Role Labeling via Tree Kernel Joint Inference , 2006, CoNLL.

[7]  Jun Suzuki,et al.  Sequence and Tree Kernels with Statistical Feature Mining , 2005, NIPS.

[8]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[9]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[10]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[11]  Yuji Matsumoto,et al.  Fast Methods for Kernel-Based Text Analysis , 2003, ACL.

[12]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[13]  Taku Kudo,et al.  Boosting-based Parse Reranking with Subtree Features , 2005, ACL.

[14]  Thorsten Joachims,et al.  Estimating the Generalization Performance of an SVM Efficiently , 2000, ICML.

[15]  Alessandro Sperduti,et al.  Fast On-line Kernel Learning for Trees , 2006, Sixth International Conference on Data Mining (ICDM'06).

[16]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[17]  Alessandro Moschitti,et al.  A Semantic Kernel for Predicate Argument Classification , 2004, CoNLL.

[18]  Hisashi Kashima,et al.  Kernels for Semi-Structured Data , 2002, ICML.

[19]  Dan Roth,et al.  On Kernel Methods for Relational Learning , 2003, ICML.

[20]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[21]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[22]  Roberto Basili,et al.  Tree Kernels for Semantic Role Labeling , 2008, CL.

[23]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[24]  Kentaro Torisawa,et al.  Speeding up Training with Tree Kernels for Node Relation Labeling , 2005, HLT.

[25]  Ivan Titov,et al.  Porting Statistical Parsers with Data-Defined Kernels , 2006, CoNLL.

[26]  Aravind K. Joshi,et al.  Using LTAG Based Features in Parse Reranking , 2003, EMNLP.

[27]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[28]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[29]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[30]  Christopher D. Manning,et al.  The Leaf Path Projection View of Parse Trees: Exploring String Kernels for HPSG Parse Selection , 2004, EMNLP.

[31]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[32]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[33]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.