Fast Unsupervised Dependency Parsing with Arc-Standard Transitions

Unsupervised dependency parsing is one of the most challenging tasks in natural languages processing. The task involves finding the best possible dependency trees from raw sentences without getting any aid from annotated data. In this paper, we illustrate that by applying a supervised incremental parsing model to unsupervised parsing; parsing with a linear time complexity will be faster than the other methods. With only 15 training iterations with linear time complexity, we gain results comparable to those of other state of the art methods. By employing two simple universal linguistic rules inspired from the classical dependency grammar, we improve the results in some languages and get the state of the art results. We also test our model on a part of the ongoing Persian dependency treebank. This work is the first work done on the Persian language.

[1]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[2]  Valentin I. Spitkovsky,et al.  Baby Steps: How “Less is More” in Unsupervised Dependency Parsing , 2009 .

[3]  Noah A. Smith,et al.  Concavity and Initialization for Unsupervised Dependency Parsing , 2012, NAACL.

[4]  Joakim Nivre,et al.  Incrementality in Deterministic Dependency Parsing , 2004 .

[5]  Zdeněk Žabokrtský,et al.  Gibbs Sampling with Treeness Constraint in Unsupervised Dependency Parsing , 2011 .

[6]  Valentin I. Spitkovsky,et al.  Punctuation: Making a Point in Unsupervised Dependency Parsing , 2011, CoNLL.

[7]  Regina Barzilay,et al.  Using Semantic Cues to Learn Syntax , 2011, AAAI.

[8]  Noah A. Smith,et al.  Guiding Unsupervised Grammar Induction Using Contrastive Estimation , 2005 .

[9]  Mark Johnson,et al.  Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing , 2009, NAACL.

[10]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[11]  Valentin I. Spitkovsky,et al.  Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing , 2010, ACL.

[12]  Mohammad Sadegh Rasooli,et al.  A Syntactic Valency Lexicon for Persian Verbs : The First Steps towards Persian Dependency Treebank , 2012 .

[13]  Noah A. Smith,et al.  Covariance in Unsupervised Learning of Probabilistic Grammars , 2010, J. Mach. Learn. Res..

[14]  Phil Blunsom,et al.  Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing , 2010, EMNLP.

[15]  Hal Daumé,et al.  Unsupervised search-based structured prediction , 2009, ICML '09.

[16]  Valentin I. Spitkovsky,et al.  Unsupervised Dependency Parsing without Gold Part-of-Speech Tags , 2011, EMNLP.

[17]  Valentin I. Spitkovsky,et al.  Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction , 2011, EMNLP.

[18]  Noah A. Smith,et al.  Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance , 2011, EMNLP.

[19]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[20]  Kewei Tu,et al.  On the Utility of Curricula in Unsupervised Learning of Probabilistic Grammars , 2011, IJCAI.

[21]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[22]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[23]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[24]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[25]  Valentin I. Spitkovsky,et al.  From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing , 2010, NAACL.

[26]  Kevin Gimpel Noah A. Smith Concavity and Initialization for Unsupervised Dependency Grammar Induction , 2011 .

[27]  Ben Taskar,et al.  Posterior Sparsity in Unsupervised Dependency Parsing , 2011, J. Mach. Learn. Res..

[28]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[29]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..