The unsupervised learning of natural language structure

There is precisely one complete language processing system to date: the human brain. Though there is debate on how much built-in bias human learners might have, we definitely acquire language in a primarily unsupervised fashion. On the other hand, computational approaches to language processing are almost exclusively supervised, relying on hand-labeled corpora for training. This reliance is largely due to unsupervised approaches having repeatedly exhibited discouraging performance. In particular, the problem of learning syntax (grammar) from completely unannotated text has received a great deal of attention for well over a decade, with little in the way of positive results. We argue that previous methods for this task have generally underperformed because of the representations they used. Overly complex models are easily distracted by non-syntactic correlations (such as topical associations), while overly simple models aren't rich enough to capture important first-order properties of language (such as directionality, adjacency, and valence). In this work, we describe several syntactic representations and associated probabilistic models which are designed to capture the basic character of natural language syntax as directly as possible. First, we examine a nested, distributional method which induces bracketed tree structures. Second, we examine a dependency model which induces word-to-word dependency structures. Finally, we demonstrate that these two models perform better in combination than they do alone. With these representations, high-quality analyses can be learned from surprisingly little text, with no labeled examples, in several languages (we show experiments with English, German, and Chinese). Our results show above-baseline performance in unsupervised parsing in each of these languages. Grammar induction methods are useful since parsed corpora exist for only a small number of languages. More generally, most high-level NLP tasks, such as machine translation and question-answering, lack richly annotated corpora, making unsupervised methods extremely appealing even for common languages like English. Finally, while the models in this work are not intended to be cognitively plausible, their effectiveness can inform the investigation of what biases are or are not needed in the human acquisition of language.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[3]  J. Hayes Cognition and the development of language , 1970 .

[4]  Giorgio Satta,et al.  Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars , 1999, ACL.

[5]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[6]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[7]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[8]  A Goodness Measure for Phrase Structure Learning via Compression with the MDL Principle , 1998 .

[9]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[10]  J. Fodor The Modularity of mind. An essay on faculty psychology , 1986 .

[11]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[12]  J. Baker Trainable grammars for speech recognition , 1979 .

[13]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[14]  Hinrich Schütze,et al.  Part-of-Speech Induction From Scratch , 1993, ACL.

[15]  Fiona Cowie,et al.  THE LOGICAL PROBLEM OF LANGUAGE ACQUISITION , 1997, Synthese.

[16]  Radford,et al.  转换生成语法教程 = Transformational Grammar , 2000 .

[17]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[18]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[19]  Dan Klein,et al.  Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[20]  Stanley F. Chen,et al.  Bayesian Grammar Induction for Language Modeling , 1995, ACL.

[21]  S. Pinker The Language Instinct , 1994 .

[22]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[23]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[24]  S. Penner Parental responses to grammatical and ungrammatical child utterances. , 1987, Child development.

[25]  Alexander Clark,et al.  Inducing Syntactic Categories by Context Distribution Clustering , 2000, CoNLL/LLL.

[26]  Pieter W. Adriaans,et al.  Grammar Induction as Substructural Inductive Logic Programming , 2001, Learning Language in Logic.

[27]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[28]  D. Angluin Negative Results for Equivalence Queries , 1990, Machine Learning.

[29]  Nancy Chang,et al.  Context-Driven Construction Learning , 2004 .

[30]  Eric Brill,et al.  Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach , 1993, ACL.

[31]  Deniz Yuret,et al.  Discovery of linguistic relations using lexical attraction , 1998, ArXiv.

[32]  Steven Abney,et al.  The English Noun Phrase in its Sentential Aspect , 1972 .

[33]  Alexander Clark,et al.  Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[34]  Nianwen Xue,et al.  Building a Large-Scale Annotated Chinese Corpus , 2002, COLING.

[35]  Eytan Ruppin,et al.  Automatic Acquisition and Efficient Representation of Syntactic Structures , 2002, NIPS.

[36]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[37]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[38]  Menno van Zaanen ABL: Alignment-Based Learning , 2000, COLING.

[39]  Alexander Clark,et al.  Unsupervised Language Acquisition: Theory and Practice , 2002, ArXiv.

[40]  J. Wolff Learning Syntax and Meanings Through Optimization and Distributional Analysis , 1988 .

[41]  E. Newport,et al.  WORD SEGMENTATION : THE ROLE OF DISTRIBUTIONAL CUES , 1996 .

[42]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[43]  James Jay Horning,et al.  A study of grammatical inference , 1969 .

[44]  Wojciech Skut,et al.  A Linguistically Interpreted Corpus of German Newspaper Text , 1998, LREC.

[45]  G. Marcus Negative evidence in language acquisition , 1993, Cognition.

[46]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[47]  Mitchell P. Marcus,et al.  Parsing a Natural Language Using Mutual Information Statistics , 1990, AAAI.

[48]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[49]  Mark A. Paskin,et al.  Grammatical Bigrams , 2001, NIPS.

[50]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[51]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[52]  Menno van Zaanen,et al.  ABL: Alignment-Based Learning , 2000, COLING.

[53]  Geoffrey K. Pullum,et al.  Learnability, Hyperlearning, and the Poverty of the Stimulus , 1996 .

[54]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[55]  C. Snow,et al.  Feedback to first language learners: the role of repetitions and clarification questions , 1986, Journal of Child Language.

[56]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[57]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[58]  R. Treiman,et al.  Brown & Hanlon revisited: mothers' sensitivity to ungrammatical forms , 1984, Journal of Child Language.

[59]  Dan Klein,et al.  Distributional phrase structure induction , 2001, CoNLL.

[60]  Ray Jackendoff,et al.  The Architecture of the Language Faculty , 1996 .

[61]  Frederick Jelinek,et al.  Exploiting Syntactic Structure for Language Modeling , 1998, ACL.

[62]  Steven Finch,et al.  Finding structure in language , 1995 .

[63]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[64]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[65]  Pat Langley,et al.  Learning Context-Free Grammars with a Simplicity Bias , 2000, ECML.

[66]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.