The Leaf Path Projection View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

We present a novel representation of parse trees as lists of paths (leaf projection paths) from leaves to the top level of the tree. This representation allows us to achieve significantly higher accuracy in the task of HPSG parse selection than standard models, and makes the application of string kernels natural. We define tree kernels via string kernels on projection paths and explore their performance in the context of parse disambiguation. We apply SVM ranking models and achieve an exact sentence accuracy of 85.40% on the Redwoods corpus.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  Thomas Gärtner,et al.  Kernels for structured data , 2008, Series in Machine Perception and Artificial Intelligence.

[3]  Rens Bod,et al.  Beyond Grammar: An Experience-Based Theory of Language , 1998 .

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Christina S. Leslie,et al.  Fast Kernels for Inexact String Matching , 2003, COLT.

[6]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[7]  Stephan Oepen,et al.  Parse Disambiguation for a Rich HPSG Grammar , 2002 .

[8]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[9]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[10]  Christopher D. Manning,et al.  Feature Selection for a Rich HPSG Grammar Using Decision Trees , 2002, CoNLL.

[11]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[12]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[13]  Aravind K. Joshi,et al.  An SVM-based voting algorithm with application to parse reranking , 2003, CoNLL.

[14]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[15]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[16]  Mark Johnson,et al.  Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training , 2000, ACL.

[17]  Thorsten Brants,et al.  The LinGO Redwoods Treebank: Motivation and Preliminary Applications , 2002, COLING.

[18]  Jun Suzuki,et al.  Hierarchical Directed Acyclic Graph Kernel: Methods for Structured Natural Language Data , 2003, ACL.

[19]  Michael Collins,et al.  Parameter Estimation for Statistical Parsing Models: Theory and Practice of , 2001, IWPT.

[20]  Jason Baldridge,et al.  Ensemble-based Active Learning for Parse Selection , 2004, NAACL.