Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization

We present a novel approach for inducing unsupervised dependency parsers for languages that have no labeled training data, but have translated text in a resourcerich language. We train probabilistic parsing models for resource-poor languages by transferring cross-lingual knowledge from resource-rich language with entropy regularization. Our method can be used as a purely monolingual dependency parser, requiring no human translations for the test data, thus making it applicable to a wide range of resource-poor languages. We perform experiments on three Data sets — Version 1.0 and version 2.0 of Google Universal Dependency Treebanks and Treebanks from CoNLL shared-tasks, across ten languages. We obtain stateof-the art performance of all the three data sets when compared with previously studied unsupervised and projected parsing systems.

[1]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[2]  Alessandro Moschitti,et al.  Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction , 2009, EMNLP.

[3]  Noah A. Smith,et al.  Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance , 2011, EMNLP.

[4]  Qun Liu,et al.  A novel dependency-to-string model for statistical machine translation , 2011, EMNLP.

[5]  David A. Smith,et al.  Bootstrapping Feature-Rich Dependency Parsers with Entropic Priors , 2007, EMNLP-CoNLL.

[6]  Dan Klein,et al.  Two Languages are Better than One (for Syntactic Parsing) , 2008, EMNLP.

[7]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[8]  Xavier Carreras,et al.  Experiments with a Higher-Order Projective Dependency Parser , 2007, EMNLP.

[9]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[10]  Ben Taskar,et al.  Dependency Grammar Induction via Bitext Projection Constraints , 2009, ACL/IJCNLP.

[11]  Kentaro Torisawa,et al.  Bitext Dependency Parsing with Bilingual Subtree Constraints , 2010, ACL.

[12]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[13]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[14]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[15]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[16]  Valentin I. Spitkovsky,et al.  Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction , 2013, EMNLP.

[17]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[18]  Mark A. Paskin,et al.  Cubic-time Parsing and Learning Algorithms for Grammatical Bigram , 2001 .

[19]  Ben Taskar,et al.  Expectation Maximization and Posterior Constraints , 2007, NIPS.

[20]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[21]  Michael Collins,et al.  Efficient Third-Order Dependency Parsers , 2010, ACL.

[22]  Hao Zhang,et al.  Online Learning for Inexact Hypergraph Search , 2013, EMNLP.

[23]  Hai Zhao,et al.  Fourth-Order Dependency Parsing , 2012, COLING.

[24]  Milan Straka,et al.  Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing , 2013, ACL.

[25]  Slav Petrov,et al.  Training a Parser for Machine Translation Reordering , 2011, EMNLP.

[26]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[27]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[28]  Joakim Nivre,et al.  Deterministic Dependency Parsing of English Text , 2004, COLING.

[29]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[30]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[31]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[32]  Gideon S. Mann,et al.  Efficient Computation of Entropy Gradient for Semi-Supervised Conditional Random Fields , 2007, NAACL.

[33]  Dan Klein,et al.  Phylogenetic Grammar Induction , 2010, ACL.

[34]  J. Baker Trainable grammars for speech recognition , 1979 .

[35]  Dale Schuurmans,et al.  Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling , 2006, ACL.

[36]  Noah A. Smith,et al.  Bilingual Parsing with Factored Estimation: Using English to Parse Korean , 2004, EMNLP.

[37]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[38]  Valentin I. Spitkovsky,et al.  From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing , 2010, NAACL.

[39]  Qun Liu,et al.  Bilingually-Constrained (Monolingual) Shift-Reduce Parsing , 2009, EMNLP.

[40]  Noah A. Smith,et al.  Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction , 2009, NAACL.

[41]  Steven P. Abney Understanding the Yarowsky Algorithm , 2004, CL.

[42]  Phil Blunsom,et al.  Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing , 2010, EMNLP.

[43]  Matthew Brand,et al.  Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction , 1999, Neural Computation.

[44]  Fei Xia,et al.  Developing ODIN: A Multilingual Repository of Annotated Language Data for Hundreds of the World's Languages , 2010, Lit. Linguistic Comput..

[45]  Satoshi Sekine,et al.  Automatic paraphrase acquisition from news articles , 2002 .

[46]  Xavier Carreras,et al.  Structured Prediction Models via the Matrix-Tree Theorem , 2007, EMNLP.

[47]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[48]  Jorge Nocedal,et al.  A Numerical Study of the Limited Memory BFGS Method and the Truncated-Newton Method for Large Scale Optimization , 1991, SIAM J. Optim..

[49]  Ben Taskar,et al.  Sparsity in Dependency Grammar Induction , 2010, ACL.

[50]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[51]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.