Baby Steps: How “Less is More” in Unsupervised Dependency Parsing

We present an empirical study of two very simple approaches t o unsupervised grammar induction. Both are based on Klein and Manning’s Dep endency Model with Valence. The first, Baby Steps, requires no initialization and bootstraps itself via iterated learning of increasingly longer sentences. Th is method substantially exceeds Klein and Manning’s published numbers and achieves 39.4% accuracy on Section 23 of the Wall Street Journal corpus — a result that is lready competitive with the recent state-of-the-art. The second, Less is More, is based on the observation that there is sometimes a trade-off between the quant ity and complexity of training data. Using the standard linguistically-infor med prior but training at the “sweet spot” — sentences up to length 15, it attains 44.1% accuracy, beating state-of-the-art. Both results generalize to the Brown cor pus and shed light on opportunities in the present state of unsupervised depende ncy parsing.

[1]  Christopher D. Manning,et al.  The unsupervised learning of natural language structure , 2005 .

[2]  Marco Colombetti,et al.  Précis of Robot Shaping: An Experiment in Behavior Engineering , 1997, Adapt. Behav..

[3]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[4]  Slav Petrov,et al.  Coarse-to-Fine Natural Language Processing , 2011, Theory and Applications of Natural Language Processing.

[5]  Satinder Singh Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[6]  S. Gathercole The development of memory. , 1998, Journal of child psychology and psychiatry, and allied disciplines.

[7]  Elissa L. Newport,et al.  Maturational Constraints on Language Learning , 1990, Cogn. Sci..

[8]  Xavier Carreras,et al.  An Empirical Study of Semi-supervised Structured Conditional Models for Dependency Parsing , 2009, EMNLP.

[9]  David Ellis,et al.  Multilevel Coarse-to-Fine PCFG Parsing , 2006, NAACL.

[10]  David Elworthy,et al.  Does Baum-Welch Re-estimation Help Taggers? , 1994, ANLP.

[11]  Rens Bod,et al.  An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[12]  Eugene L. Allgower,et al.  Numerical continuation methods - an introduction , 1990, Springer series in computational mathematics.

[13]  Tony Savage,et al.  Shaping: A multiple contingencies analysis and its relevance to behaviour-based robotics , 2001, Connect. Sci..

[14]  E. Newport Constraints on learning and their role in language acquisition: Studies of the acquisition of American sign language , 1988 .

[15]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[16]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[17]  Terence D. Sanger,et al.  Neural network learning control of robot manipulators using gradually increasing task difficulty , 1994, IEEE Trans. Robotics Autom..

[18]  Dan Klein,et al.  Coarse-to-Fine Syntactic Machine Translation using Language Projections , 2008, EMNLP.

[19]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[20]  Gideon S. Mann,et al.  Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria , 2009, ACL/IJCNLP.

[21]  Dan Klein,et al.  Analyzing the Errors of Unsupervised Learning , 2008, ACL.

[22]  David S. Touretzky,et al.  Shaping robot behavior using principles from instrumental conditioning , 1997, Robotics Auton. Syst..

[23]  Noah A. Smith,et al.  Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction , 2008, NIPS.

[24]  P MarcusMitchell,et al.  Building a large annotated corpus of English , 1993 .

[25]  Noah A. Smith,et al.  Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction , 2009, NAACL.

[26]  Kai A. Krueger,et al.  Flexible shaping: How learning in small steps helps , 2009, Cognition.

[27]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[28]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[29]  Douglas L. T. Rohde,et al.  Language acquisition in the absence of explicit negative evidence: how important is starting small? , 1999, Cognition.

[30]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[31]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[32]  Tony Savage,et al.  Shaping: The Link Between Rats and Robots , 1998, Connect. Sci..

[33]  Mark Johnson,et al.  Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing , 2009, NAACL.

[34]  Marco Colombetti,et al.  Robot Shaping: An Experiment in Behavior Engineering , 1997 .

[35]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[36]  Noah A. Smith,et al.  Guiding Unsupervised Grammar Induction Using Contrastive Estimation , 2005 .

[37]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[38]  R. Kail The development of memory in Children , 1979 .

[39]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[40]  PietraVincent J. Della,et al.  The mathematics of statistical machine translation , 1993 .

[41]  J. Baker Trainable grammars for speech recognition , 1979 .

[42]  Kevin Knight,et al.  Minimized Models for Unsupervised Part-of-Speech Tagging , 2009, ACL/IJCNLP.

[43]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.