Scalable semi-supervised grammar induction using cross-linguistically parameterized syntactic prototypes

This thesis is about the task of unsupervised parser induction: automatically learning grammars and parsing models from raw text. We endeavor to induce such parsers by observing sequences of terminal symbols. We focus on overcoming the problem of frequent collocation that is a major source of error in grammar induction. For example, since a verb and a determiner tend to co-occur in a verb phrase, the probability of attaching the determiner to the verb is sometimes higher than that of attaching the core noun to the verb, resulting in erroneous attachment *((Verb Det) Noun) instead of (Verb (Det Noun)). Although frequent collocation is the heart of grammar induction, it is precariously capable of distorting the grammar distribution. Natural language grammars follow a Zipfian (power law) distribution, where the frequency of any grammar rule is inversely proportional to its rank in the frequency table. We believe that covering the most frequent grammar rules in grammar induction will have a strong impact on accuracy. We propose an efficient approach to grammar induction guided by cross-linguistic language parameters. Our language parameters consist of 33 parameters of frequent basic word orders, which are easy to be elicited from grammar compendiums or short interviews with naïve language informants. These parameters are designed to capture frequent word orders in the Zipfian distribution of natural language grammars, while the rest of the grammar including exceptions can be automatically induced from unlabeled data. The language parameters shrink the search space of the grammar induction problem by exploiting both word order information and predefined attachment directions. The contribution of this thesis is three-fold. (1) We show that the language parameters are adequately generalizable cross-linguistically, as our grammar induction experiments will be carried out on 14 languages on top of a simple unsupervised grammar induction system. (2) Our specification of language parameters improves the accuracy of unsupervised parsing even when the parser is exposed to much less frequent linguistic phenomena in longer sentences when the accuracy decreases within 10%. (3) We investigate the prevalent factors of errors in grammar induction which will provide room for accuracy improvement.

[1]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[2]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[3]  Dan Klein,et al.  Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[4]  Anders Søgaard From ranked words to dependency trees: two-stage unsupervised non-projective dependency parsing , 2011, Graph-based Methods for Natural Language Processing.

[5]  Phil Blunsom,et al.  The PASCAL Challenge on Grammar Induction , 2012, HLT-NAACL 2012.

[6]  Julia Hockenmaier,et al.  Data and models for statistical parsing with combinatory categorial grammar , 2003 .

[7]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[8]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[9]  Christopher D. Manning,et al.  The unsupervised learning of natural language structure , 2005 .

[10]  Alexander Koller,et al.  Dependency Trees and the Strong Generative Capacity of CCG , 2009, EACL.

[11]  Ari Rappoport,et al.  Automatic Selection of High Quality Parses Created By a Fully Unsupervised Parser , 2009, CoNLL.

[12]  Martin Haspelmath,et al.  Order of relative clause and noun , 2013 .

[13]  Mark Steedman,et al.  Semi-supervised CCG Lexicon Extension , 2011, EMNLP.

[14]  Evelina Andersson,et al.  Cross-Framework Evaluation for Statistical Parsing , 2012, EACL.

[15]  A. Mubaidin Jordan , 2010, Practical Neurology.

[16]  Dan Klein,et al.  Prototype-Driven Grammar Induction , 2006, ACL.

[17]  Shalom Lappin,et al.  Unsupervised Learning and Grammar Induction , 2010 .

[18]  Jason Baldridge,et al.  Multi-Modal Combinatory Categorial Grammar , 2003, EACL.

[19]  Shalom Lappin,et al.  Computational Learning Theory and Language Acquisition , 2012 .

[20]  A. James 2010 , 2011, Philo of Alexandria: an Annotated Bibliography 2007-2016.

[21]  W. Marsden I and J , 2012 .

[22]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[23]  Yonatan Bisk,et al.  Induction of Linguistic Structure with Combinatory Categorial Grammars , 2012, HLT-NAACL 2012.

[24]  Mark Johnson,et al.  Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars , 2009, NAACL.

[25]  Valentin I. Spitkovsky,et al.  Punctuation: Making a Point in Unsupervised Dependency Parsing , 2011, CoNLL.

[26]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[27]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[29]  Menno van Zaanen,et al.  Bootstrapping structure into language : alignment-based learning , 2001, ArXiv.

[30]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[31]  Valentin I. Spitkovsky,et al.  Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction , 2011, EMNLP.

[32]  Felice Dell'Orletta,et al.  ULISSE: an Unsupervised Algorithm for Detecting Reliable Dependency Parses , 2011, CoNLL.

[33]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[34]  Kenichi Kurihara,et al.  Variational Bayesian Grammar Induction for Natural Language , 2006, ICGI.

[35]  Keh-Jiann Chen,et al.  Chinese Treebanks and Grammar Extraction , 2004, IJCNLP.

[36]  Yonatan Bisk,et al.  Simple Robust Grammar Induction with Combinatory Categorial Grammars , 2012, AAAI.

[37]  Julia Hockenmaier Parsing with Generative Models of Predicate-Argument Structure , 2003, ACL.

[38]  Dan Klein,et al.  Distributional phrase structure induction , 2001, CoNLL.

[39]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[40]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[41]  John Cocke,et al.  Programming languages and their compilers: Preliminary notes , 1969 .

[42]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[43]  Mark Steedman,et al.  Grammar Induction from Text Using Small Syntactic Prototypes , 2011, IJCNLP.

[44]  Evelina Andersson,et al.  Joint Evaluation of Morphological Segmentation and Syntactic Parsing , 2012, ACL.

[45]  Mark Johnson,et al.  Unsupervised phonemic Chinese word segmentation using Adaptor Grammars , 2010, COLING.

[46]  Ted Briscoe,et al.  Learning Stochastic Categorial Grammars , 1997, CoNLL.

[47]  J. Song,et al.  Korean Periphrastic Causative Constructions , 2013 .

[48]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[49]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[50]  Mark Johnson,et al.  Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure , 2008, ACL.