Online Large-Margin Training of Dependency Parsers

We present an effective training algorithm for linearly-scored dependency parsers that implements online large-margin multi-class training (Crammer and Singer, 2003; Crammer et al., 2003) on top of efficient parsing techniques for dependency trees (Eisner, 1996). The trained parsers achieve a competitive dependency accuracy for both English and Czech with no language specific enhancements.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[3]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[4]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[5]  Giorgio Satta,et al.  Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars , 1999, ACL.

[6]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[7]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[8]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[11]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[12]  Satoshi Sekine,et al.  Automatic paraphrase acquisition from news articles , 2002 .

[13]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[14]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[15]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[16]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[17]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[18]  Roberto Basili,et al.  Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms by Thorsten Joachims , 2003, Comput. Linguistics.

[19]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[20]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[21]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[22]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[23]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[24]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[25]  Joakim Nivre,et al.  Deterministic Dependency Parsing of English Text , 2004, COLING.

[26]  James R. Curran,et al.  Parsing the WSJ Using CCG and Log-Linear Models , 2004, ACL.

[27]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[28]  Yuan Ding,et al.  Machine Translation Using Probabilistic Synchronous Dependency Insertion Grammars , 2005, ACL.

[29]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[30]  菅山 謙正,et al.  Word Grammar 理論の研究 , 2005 .