From Exemplar to Grammar: Integrating Analogy and Probability in Language Learning

We present a new model of language learning which is based on the following idea: if a language learner does not know which phrase-structure trees should be assigned to initial sentences, s/he allows (implicitly) for all possible trees and lets linguistic experience decide which is the ‘best’ tree for each sentence. The best tree is obtained by maximizing ‘structural analogy’ between a sentence and previous sentences, which is formalized by the most probable shortest combination of subtrees from all trees of previous sentences. Corpus-based experiments with this model on the Penn Treebank and the Childes database indicate that it can learn both exemplar-based and rulebased aspects of language, ranging from phrasal verbs to auxiliary fronting. By having learned the syntactic structures of sentences, we have also learned the grammar implicit in these structures, which can in turn be used to produce new sentences. We show that our model mimicks children’s language development from item-based constructions to abstract constructions, and that the model can simulate some of the errors made by children in producing complex questions.

[1]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[2]  Willem H. Zuidema Parsimonious Data-Oriented Parsing , 2007, EMNLP-CoNLL.

[3]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[4]  P MarcusMitchell,et al.  Building a large annotated corpus of English , 1993 .

[5]  Günter Neumann,et al.  HPSG–DOP: data–oriented parsing with HPSG , 2002 .

[6]  Ernst L. Moerk,et al.  The Mother Of Eve: As A First Language Teacher , 1984 .

[7]  Kirsten Abbot-Smith,et al.  Exemplar-learning and schematization in a usage-based account of syntactic acquisition , 2006 .

[8]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[9]  B. MacWhinney Item Based Constructions and the Logical Problem , 2005 .

[10]  Khalil Sima'an,et al.  A memory-based model of syntactic analysis: data-oriented parsing , 1999, J. Exp. Theor. Artif. Intell..

[11]  Willem H. Zuidema Theoretical Evaluation of Estimation Methods for Data-Oriented Parsing , 2006, EACL.

[12]  K. McRae,et al.  Children’s Grammars Grow More Abstract with Age – Evidence from an Automatic Procedure for Identifying the Productive Units of Language , 2008 .

[13]  Alon Lavie,et al.  High-accuracy Annotation and Parsing of CHILDES Transcripts , 2007 .

[14]  R. Brown,et al.  A First Language , 1973 .

[15]  Rens Bod Do all fragments count? , 2003, Nat. Lang. Eng..

[16]  Martin Kay,et al.  Algorithm schemata and data structures in syntactic processing , 1986 .

[17]  Barbara C. Scholz,et al.  Empirical assessment of stimulus poverty arguments , 2002 .

[18]  A. Goldberg Constructions at Work: The Nature of Generalization in Language , 2006 .

[19]  B. MacWhinney The Acquisition Of Morphophonology , 1978 .

[20]  Fernand Gobet,et al.  Modeling the Developmental Patterning of Finiteness Marking in English, Dutch, German, and Spanish Using MOSAIC , 2007, Cogn. Sci..

[21]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[22]  Caroline F. Rowland,et al.  Is Structure Dependence an Innate Constraint? New Experimental Evidence From Children's Complex-Question Production , 2008, Cogn. Sci..

[23]  L.W.M. Bod,et al.  Grammaticality, Robustness and specificity in a Probabilistic Approach to Lexical Functional Analysis , 1998 .

[24]  J. Sullivan ON CARTESIAN LINGUISTICS , 1977 .

[25]  Rens Bod,et al.  Context-sensitive spoken dialogue processing with the DOP model , 1999, Natural Language Engineering.

[26]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[27]  Joan L. Bybee,et al.  From Usage to Grammar: The Mind's Response to Repetition , 2007 .

[28]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[29]  Rens Bod,et al.  A Computational Model of Language Performance: Data Oriented Parsing , 1992, COLING.

[30]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[31]  Menno van Zaanen ABL: Alignment-Based Learning , 2000, COLING.

[32]  S. Crain,et al.  Structure dependence in grammar formation , 1987 .

[33]  Statistics vs. UG in Language Acquisition: Does a Bigram Analysis Predict Auxiliary Inversion? , 2005 .

[34]  Brian MacWhinney,et al.  The emergence of language. , 1999 .

[35]  Fei Xia,et al.  Converting Dependency Structures to Phrase Structures , 2001, HLT.

[36]  Paula Buttery,et al.  Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition , 2007 .

[37]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[38]  Rens Bod,et al.  Sentence memory: Storage vs. computation of frequent sentences , 2001 .

[39]  Dan Klein,et al.  Natural language grammar induction with a generative constituent-context model , 2005, Pattern Recognit..

[40]  Rens Bod A Linguistic Investigation into U-DOP , 2007, ACL 2007.

[41]  Khalil Sima'an,et al.  A Consistent and Efficient Estimator for Data-Oriented Parsing , 2005, J. Autom. Lang. Comb..

[42]  Morten H. Christiansen,et al.  Uncovering the Richness of the Stimulus: Structure Dependence and Indirect Statistical Evidence , 2005, Cogn. Sci..

[43]  Stephen Crain,et al.  Acquisition of Syntax and Semantics , 2006 .

[44]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[45]  Rens Bod,et al.  Beyond Grammar: An Experience-Based Theory of Language , 1998 .

[46]  Ann M. Peters,et al.  The Units of Language Acquisition , 1983 .

[47]  J. Elman,et al.  Learnability and the Statistical Structure of Language: Poverty of Stimulus Arguments Revisited , 2004 .

[48]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[49]  Alexander Clark,et al.  Learning Auxiliary Fronting with Grammatical Inference , 2006, CoNLL.

[50]  Rens Bod,et al.  A DOP Model for Semantic Interpretation , 1997, ACL.

[51]  J. Tenenbaum,et al.  Poverty of the Stimulus? A Rational Approach , 2006 .

[52]  Rens Bod,et al.  Children's Grammars Grow More Abstract with Age - Evidence from an Automatic Procedure for Identifying the Productive Units of Language , 2009, Top. Cogn. Sci..

[53]  D. Gentner,et al.  Structure mapping in analogy and similarity. , 1997 .

[54]  Eytan Ruppin,et al.  Unsupervised learning of natural languages , 2006 .

[55]  Rens Bod,et al.  A Unified Model of Structural Organization in Language and Music , 2002, J. Artif. Intell. Res..

[56]  Mark Johnson The DOP Estimation Method Is Biased and Inconsistent , 2002, Computational Linguistics.

[57]  Lyn Frazier,et al.  ON COMPREHENDING SENTENCES: SYNTACTIC PARSING STRATEGIES. , 1979 .

[58]  Rens Bod,et al.  Parsing with the Shortest Derivation , 2000, COLING.

[59]  Zhiyi Chi,et al.  Estimation of Probabilistic Context-Free Grammars , 1998, Comput. Linguistics.

[60]  Khalil Sima'an,et al.  Data-Oriented Parsing , 2003 .

[61]  Aravind K. Joshi,et al.  Starting with complex primitives pays off: complicate locally, simplify globally , 2004, Cogn. Sci..

[62]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[63]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[64]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[65]  Simon Dennis,et al.  An exemplar-based approach to unsupervised parsing , 2005 .

[66]  Khalil Simaan,et al.  Computational Complexity of Probabilistic Disambiguation by means of Tree-Grammars , 1996, COLING.

[67]  Jean Christophe Verstraeh Frequency and the emergence of linguistic structure , 2005 .

[68]  C. Fillmore,et al.  Grammatical constructions and linguistic generalizations: The What's X doing Y? construction , 1999 .

[69]  Joshua Goodman Efficient Algorithms for Parsing the DOP Model , 1996, EMNLP.

[70]  N. Chater The Search for Simplicity: A Fundamental Cognitive Principle? , 1999 .

[71]  Andy Way,et al.  Data-oriented parsing and the Penn Chinese treebank , 2004 .

[72]  Nick Chater,et al.  Distributional Information: A Powerful Cue for Acquiring Syntactic Categories , 1998, Cogn. Sci..

[73]  Michael Barlow,et al.  Usage-based models of language , 2000 .

[74]  C. Rowland Explaining errors in children’s questions , 2007, Cognition.

[75]  Noam Chomsky,et al.  The faculty of language: what is it, who has it, and how did it evolve? , 2002, Science.

[76]  W. Bruce Croft Radical Construction Grammar , 2001 .

[77]  Alexander Clark,et al.  Inducing Syntactic Categories by Context Distribution Clustering , 2000, CoNLL/LLL.

[78]  Erwin A. Esper Analogy and association in linguistics and psychology , 1973 .

[79]  S. Crain Language acquisition in the absence of experience , 1991, Behavioral and Brain Sciences.

[80]  Morten H. Christiansen,et al.  PSYCHOLOGICAL SCIENCE Research Article Statistical Learning Within and Between Modalities Pitting Abstract Against Stimulus-Specific Representations , 2022 .

[81]  Rens Bod,et al.  An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[82]  Rens Bod,et al.  Is the End of Supervised Parsing in Sight? , 2007, ACL.

[83]  Bernard Lang,et al.  The Structure of Shared Forests in Ambiguous Parsing , 1989, ACL.

[84]  M. Banko,et al.  Analogical Modeling of Language , 1991, CL.

[85]  Rens Bod,et al.  Exemplar-based syntax: How to get productivity from examples , 2006 .

[86]  Brady Clark,et al.  On Stochastic Grammar , 2005 .

[87]  Nianwen Xue,et al.  Building a Large-Scale Annotated Chinese Corpus , 2002, COLING.