Automatic parsing of parental verbal input

To evaluate theoretical proposals regarding the course of child language acquisition, researchers often need to rely on the processing of large numbers of syntacticallyparsed utterances, both from children and from their parents. Because it is so difficult to do this by hand, there are currently no parsed corpora of child language input data. To automate this process, we developed a system that combined the MOR tagger, a rule-based parser, and statistical disambiguation techniques. The resultant system obtained nearly 80% correct parses for the sentences spoken to children. To achieve this level, we had to construct a particular processing sequence that minimizes problems caused by the coverage/ ambiguity tradeoff in parser design. These procedures are particularly appropriate for use with the CHILDES database, an international corpus of transcripts. The data and programs are now freely available over the Internet.

[1]  R. Brown,et al.  A First Language , 1973 .

[2]  Noam Chomsky Some Concepts and Consequences of the Theory of Government and Binding , 1982 .

[3]  Ernst L. Moerk,et al.  The Mother Of Eve: As A First Language Teacher , 1984 .

[4]  James F. Allen Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[5]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[8]  James F. Allen Natural language understanding (2nd ed.) , 1995 .

[9]  Alexander H. Waibel,et al.  Search in a Learnable Spoken Language Parser , 1996, ECAI.

[10]  Alon Lavie,et al.  Glr*: a robust grammar-focused parser for spontaneously spoken language , 1996 .

[11]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[12]  Geoffrey Leech,et al.  Corpus Annotation: Linguistic Information from Computer Text Corpora , 1997 .

[13]  Roger Garside,et al.  A hybrid grammatical tagger: CLAWS4 , 1997 .

[14]  Brian MacWhinney,et al.  The emergence of language. , 1999 .

[15]  R. Hausser Foundations of Computational Linguistics , 1999, Springer Berlin Heidelberg.

[16]  J. Bresnan Lexical-Functional Syntax , 2000 .

[17]  B. MacWhinney The Childes Project: Tools for Analyzing Talk, Volume I: Transcription format and Programs , 2000 .

[18]  Christophe Parisse,et al.  Automatic disambiguation of morphosyntax in spoken language corpora , 2000, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[19]  Robert C. Moore,et al.  Improved Left-corner Chart Parsing for Large Context-free Grammars , 2000, IWPT.

[20]  Aline Villavicencio,et al.  The Acquisition of Word Order by a Computational Learning System , 2000, CoNLL/LLL.

[21]  Brian Roark,et al.  Robust Probabilistic Predictive Syntactic Processing , 2001, ArXiv.

[22]  Eugene Charniak,et al.  Edit Detection and Parsing for Transcribed Speech , 2001, NAACL.

[23]  Carolyn Penstein Rosé,et al.  BALANCING ROBUSTNESS AND EFFICIENCY IN UNIFICATION-AUGMENTED CONTEXT-FREE PARSERS FOR LARGE PRACTICAL APPLICATIONS , 2001 .

[24]  Mark Johnson,et al.  Robust probabilistic predictive syntactic processing: motivations, models, and applications , 2001 .