An empirical generative framework for computational modeling of language acquisition.

This paper reports progress in developing a computer model of language acquisition in the form of (1) a generative grammar that is (2) algorithmically learnable from realistic corpus data, (3) viable in its large-scale quantitative performance and (4) psychologically real. First, we describe new algorithmic methods for unsupervised learning of generative grammars from raw CHILDES data and give an account of the generative performance of the acquired grammars. Next, we summarize findings from recent longitudinal and experimental work that suggests how certain statistically prominent structural properties of child-directed speech may facilitate language acquisition. We then present a series of new analyses of CHILDES data indicating that the desired properties are indeed present in realistic child-directed speech corpora. Finally, we suggest how our computational results, behavioral findings, and corpus-based insights can be integrated into a next-generation model aimed at meeting the four requirements of our modeling framework.

[1]  R. Higginson Fixing : assimilation in language acquisition , 1985 .

[2]  E. Hoff-Ginsberg Maternal speech and the child's development of syntax: a further look , 1990, Journal of Child Language.

[3]  C Snow,et al.  Child language data exchange system , 1984, Journal of Child Language.

[4]  Noam Chomsky,et al.  The Minimalist Program , 1992 .

[5]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[6]  J. Pine,et al.  Lexically-based learning and early grammatical development , 1997, Journal of Child Language.

[7]  Elisabeth Dévière,et al.  Analyzing linguistic data: a practical introduction to statistics using R , 2009 .

[8]  Heidi Waterfall,et al.  Characterizing Motherese: On the Computational Structure of Child-Directed Language , 2007 .

[9]  S. Edelman,et al.  Learn locally, act globally: Learning language from variation set cues , 2008, Cognition.

[10]  Morten H. Christiansen,et al.  Connectionist psycholinguistics: capturing the empirical data , 2001, Trends in Cognitive Sciences.

[11]  John Goldsmtth TOWARDS A NEW EMPIRICISM , 2007 .

[12]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[13]  H. L. Owrid Spoken words: Effects of situation and social group on oral word usage and frequency , 1985 .

[14]  Alexander Clark,et al.  PAC-Learning Unambiguous NTS Languages , 2006, ICGI.

[15]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[16]  E. Hoff-Ginsberg,et al.  Some contributions of mothers' speech to their children's syntactic growth , 1985, Journal of Child Language.

[17]  Carson T. Schütze The empirical base of linguistics: Grammaticality judgments and linguistic methodology , 1998 .

[18]  Jennifer Culbertson,et al.  Word-minimality, Epenthesis and Coda Licensing in the Early Acquisition of English , 2006, Language and speech.

[19]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[20]  Keith E. Nelson,et al.  Facilitating children's syntax acquisition. , 1977 .

[21]  Brian MacWhinney,et al.  The CHIP framework: Automatic coding and analysis of parent-child conversational interaction , 1990 .

[22]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Alexander Clark,et al.  Unsupervised Language Acquisition: Theory and Practice , 2002, ArXiv.

[24]  J. Wolff Learning Syntax and Meanings Through Optimization and Distributional Analysis , 1988 .

[25]  Kees Vermeulen,et al.  Algebras, diagrams and decisions in language, logic and computation , 2001 .

[26]  I. M. Schlesinger,et al.  Categories and Processes in Language Acquisition , 1990 .

[27]  Shimon Edelman,et al.  Behavioral and computational aspects of language and its acquisition , 2007 .

[28]  William C. Tirre,et al.  The Communicative Environment of Young Children: Social Class, Ethnic, and Situational Differences. Technical Report No. 125. , 1979 .

[29]  Rens Bod,et al.  Constructions at work or at rest? , 2008 .

[30]  Dan Klein,et al.  Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[31]  Nick Chater,et al.  Distributional Information: A Powerful Cue for Acquiring Syntactic Categories , 1998, Cogn. Sci..

[32]  William E. Nagy,et al.  Situational variation in the use of internal state words , 1981 .

[33]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[34]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[35]  M. Pickering,et al.  Toward a mechanistic psychology of dialogue , 2004, Behavioral and Brain Sciences.

[36]  Pieter W. Adriaans,et al.  Learning Shallow Context-free Languages under Simple Distributions , 2001 .

[37]  Benedikt Szmrecsanyi Language users as creatures of habit: A corpus-based analysis of persistence in spoken English , 2005 .

[38]  S. Edelman On look-ahead in language: navigating a multitude of familiar paths∗ , 2009 .

[39]  Jeffrey L. Sokolov,et al.  A Local Contingency Analysis of the Fine-Tuning Hypothesis. , 1993 .

[40]  J. Bohannon,et al.  Imitations, interactions, and language acquisition , 1983, Journal of Child Language.

[41]  Eytan Ruppin,et al.  Unsupervised learning of natural languages , 2006 .

[42]  L. Bloom Language Development: Form and Function in Emerging Grammars , 1970 .

[43]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[44]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[45]  Eleanor Olds Batchelder,et al.  Bootstrapping the lexicon: A computational model of infant speech segmentation , 2002, Cognition.

[46]  D. Slobin,et al.  Social interaction, Social Context, and Language : Essays in Honor of Susan Ervin-tripp , 1996 .

[47]  P. Suppes The Semantics of Children's Language , 1974 .

[48]  Dan Klein,et al.  Distributional phrase structure induction , 2001, CoNLL.

[49]  Roger W. Brown,et al.  A First Language: The Early Stages , 1974 .

[50]  M. Raijmakers Rethinking innateness: A connectionist perspective on development. , 1997 .

[51]  Paul M. B. Vitányi,et al.  ‘Ideal learning’ of natural language: Positive results about learning from positive evidence , 2007 .

[52]  Rafael C. Carrasco,et al.  Grammatical Inference and Applications , 1994, Lecture Notes in Computer Science.

[53]  Lois Bloom,et al.  One Word at a Time: The Use of Single Word Utterances Before Syntax , 1976 .

[54]  Neil D. Lawrence,et al.  Advances in Neural Information Processing Systems 14 , 2002 .

[55]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[56]  Aravind K. Joshi,et al.  Tree-Adjoining Grammars , 1997, Handbook of Formal Languages.