Scalable Discriminative Learning for Natural Language Parsing and Translation

Parsing and translating natural languages can be viewed as problems of predicting tree structures. For machine learning approaches to these predictions, the diversity and high dimensionality of the structures involved mandate very large training sets. This paper presents a purely discriminative learning method that scales up well to problems of this size. Its accuracy was at least as good as other comparable methods on a standard parsing task. To our knowledge, it is the first purely discriminative learning algorithm for translation with tree-structured models. Unlike other popular methods, this method does not require a great deal of feature engineering a priori, because it performs feature selection over a compound feature space as it learns. Experiments demonstrate the method's versatility, accuracy, and efficiency. Relevant software is freely available at http://nlp.cs.nyu.edu/parser and http://nlp.cs.nyu.edu/GenPar.

[1]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[2]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[3]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Beatrice Santorini,et al.  The Penn Treebank: An Overview , 2003 .

[6]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[7]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[8]  Joseph P. Turian,et al.  Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.

[9]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[11]  Kevin Knight,et al.  Training Tree Transducers , 2004, NAACL.

[12]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[13]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[14]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[15]  Huan Liu,et al.  Incremental Feature Selection , 1998, Applied Intelligence.

[16]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[17]  Stefan Riezler,et al.  Incremental Feature Selection and l1 Regularization for Relaxed Maximum-Entropy Modeling , 2004, EMNLP.

[18]  I. Dan Melamed,et al.  Statistical Machine Translation by Parsing , 2004, ACL.

[19]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[20]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[21]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[22]  Daphne Koller,et al.  Word-Sense Disambiguation for Machine Translation , 2005, HLT.

[23]  I. Dan Melamed,et al.  Constituent Parsing by Classification , 2005, IWPT.

[24]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[25]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[26]  Michael Collins,et al.  A Discriminative Model for Tree-to-Tree Translation , 2006, EMNLP.

[27]  Stefan Riezler,et al.  Grammatical Machine Translation , 2006, NAACL.

[28]  I. Dan Melamed,et al.  Advances in Discriminative Parsing , 2006, ACL.

[29]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[30]  I. Dan Melamed,et al.  Computational Challenges in Parsing by Classification , 2006 .

[31]  Chris Pike,et al.  Scalable Purely-Discriminative Training for Word and Tree Transducers , 2006 .

[32]  Hal Daumé Notes on CG and LM-BFGS Optimization of Logistic Regression , 2008 .