Improved Large Margin Dependency Parsing via Local Constraints and Laplacian Regularization

We present an improved approach for learning dependency parsers from tree-bank data. Our technique is based on two ideas for improving large margin training in the context of dependency parsing. First, we incorporate local constraints that enforce the correctness of each individual link, rather than just scoring the global parse tree. Second, to cope with sparse data, we smooth the lexical parameters according to their underlying word similarities using Laplacian Regularization. To demonstrate the benefits of our approach, we consider the problem of parsing Chinese treebank data using only lexical features, that is, without part-of-speech tags or grammatical categories. We achieve state of the art performance, improving upon current large margin approaches.

[1]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[2]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[3]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[4]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[5]  Dale Schuurmans,et al.  Strictly Lexical Dependency Parsing , 2005, IWPT.

[6]  Heidi Fox,et al.  Phrasal Cohesion and Statistical Machine Translation , 2002, EMNLP.

[7]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[8]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[9]  Giorgio Satta,et al.  Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars , 1999, ACL.

[10]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[11]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[12]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[13]  Ralph Grishman,et al.  Unsupervised Discovery of Scenario-Level Patterns for Information Extraction , 2000, ANLP.

[14]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[15]  Colin Cherry,et al.  A Probability Model to Improve Word Alignment , 2003, ACL.

[16]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[17]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[18]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[19]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[20]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[21]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[22]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[23]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[24]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[25]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[26]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.