Parsing of Grammatical Relations in Transcripts of Parent-Child Dialogs

Automatic analysis of syntax is one of the core problems in natural language processing. Despite significant advances in syntactic parsing of written text, the application of these techniques to spontaneous spoken language has received more limited attention. The recent explosive growth of online, accessible corpora of spoken language interactions opens up new opportunities for the development of high accuracy parsing approaches to the analysis of spoken language. The availability of high accuracy parsers will in turn provide a platform for development of a wide range of new applications, as well as for advanced research on the nature of conversational interactions. One concrete field of investigation that is ripe for the application of such parsing tools is the study of child language acquisition. In this thesis, we describe an approach for analyzing the syntactic structure of spontaneous conversational language in parent-child interactions. Specific emphasis is placed on the challenge of accurately annotating the English corpora in the CHILDES database with grammatical relations (such as subject, objects and adjuncts) that are of particular interest and utility, to researchers in child language acquisition. This work involves rule-based and corpus-based natural language processing techniques, as well as methodology for combining results from different parsing approaches. We present novel strategies for integrating the results of different parsers into a system with improved accuracy. One practical application of this research is the automation of language competence measures used by clinicians and researchers of child language development. We present an implementation of an automatic version of one such measurement scheme. This provides not only a useful tool for the child language research community, but also a task-based evaluation framework for grammatical relation identification. Through experiments using data from the Penn Treebank, we show that several of the techniques and ideas presented in this thesis are applicable not just to analysis of parent-child dialogs, but to parsing in general.

[1]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[2]  Christophe Parisse,et al.  Automatic disambiguation of morphosyntax in spoken language corpora , 2000, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[3]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[4]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[5]  Eugene Charniak,et al.  Edit Detection and Parsing for Transcribed Speech , 2001, NAACL.

[6]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[7]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[8]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[9]  B. Hladká,et al.  The Prague Dependency Treebank: Annotation Structure and Support , 2022 .

[10]  Ted Briscoe,et al.  Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars , 1993, CL.

[11]  Teruko Mitamura,et al.  A Fast, Accurate Deterministic Parser for Chinese , 2006, ACL.

[12]  James R. Curran,et al.  The Importance of Supertagging for Wide-Coverage CCG Parsing , 2004, COLING.

[13]  Daniel M. Bikel,et al.  Design of a multi-lingual, parallel-processing statistical parsing engine , 2002 .

[14]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[15]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[16]  Brian MacWhinney,et al.  The Handbook of Child Language , 1995 .

[17]  Alexander S. Yeh,et al.  Using Existing Systems to Supplement Small Amounts of Annotated Grammatical Relations Training Data , 2000, ACL.

[18]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[19]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[20]  James F. Allen,et al.  Generic Parsing for Multi-Domain Semantic Interpretation , 2005, IWPT.

[21]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[22]  Alon Lavie,et al.  Glr*: a robust grammar-focused parser for spontaneously spoken language , 1996 .

[23]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[24]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[25]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[26]  Ernst L. Moerk,et al.  The Mother Of Eve: As A First Language Teacher , 1984 .

[27]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[28]  James R. Curran,et al.  Parsing the WSJ Using CCG and Log-Linear Models , 2004, ACL.

[29]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[30]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[31]  Eric Brill,et al.  Classifier Combination for Improved Lexical Disambiguation , 1998, ACL.

[32]  David Yarowsky,et al.  Modeling Consensus: Classifier Combination for Word Sense Disambiguation , 2002, EMNLP.

[33]  John A. Carroll Practical unification-based parsing of Natural Language , 1993 .

[34]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[35]  Hopkins UniversityBaltimore Exploiting Diversity in Natural Language Processing: Combining Parsers , 1999 .

[36]  Ted Briscoe,et al.  Parser evaluation: using a grammatical relation annotation scheme , 2003 .

[37]  Walter Daelemans,et al.  Cascaded Grammatical Relation Assignment , 1999, EMNLP.

[38]  John Cocke,et al.  Programming languages and their compilers: Preliminary notes , 1969 .

[39]  Susan T. Dumais,et al.  The Combination of Text Classifiers Using Reliability Indicators , 2016, Information Retrieval.

[40]  Ted Briscoe,et al.  Robust Accurate Statistical Annotation of General Text , 2002, LREC.

[41]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[42]  Steven H. Long,et al.  Accuracy of Four Language Analysis Procedures Performed Automatically , 2001 .

[43]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[44]  Khalil Sima'an,et al.  Data-Oriented Parsing , 2003 .

[45]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[46]  Joakim Nivre,et al.  Pseudo-Projective Dependency Parsing , 2005, ACL.

[47]  Masaru Tomita,et al.  The Generalized LR Parser/Compiler V8-4: A Software Package for Practical NL Projects , 1990, COLING.

[48]  John Cocke,et al.  Programming languages and their compilers , 1969 .

[49]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[50]  Joakim Nivre,et al.  Deterministic Dependency Parsing of English Text , 2004, COLING.

[51]  Jun'ichi Tsujii,et al.  Efficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing , 2005, IWPT.

[52]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[53]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[54]  Robert C. Moore,et al.  Improved Left-corner Chart Parsing for Large Context-free Grammars , 2000, IWPT.

[55]  Carolyn Penstein Rosé,et al.  BALANCING ROBUSTNESS AND EFFICIENCY IN UNIFICATION-AUGMENTED CONTEXT-FREE PARSERS FOR LARGE PRACTICAL APPLICATIONS , 2001 .

[56]  Dekang Lin,et al.  A dependency-based method for evaluating broad-coverage parsers , 1995, Natural Language Engineering.

[57]  Daniel Zeman,et al.  Improving Parsing Accuracy by Combining Diverse Dependency Parsers , 2005, IWPT.

[58]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[59]  Masaru Tomita,et al.  An Efficient Augmented-Context-Free Parsing Algorithm , 1987, Comput. Linguistics.

[60]  Yuji Matsumoto,et al.  A Boosting Algorithm for Classification of Semi-Structured Text , 2004, EMNLP.

[61]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[62]  Ted Briscoe,et al.  High Precision Extraction of Grammatical Relations , 2001, COLING.

[63]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[64]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[65]  Noam Chomsky Some Concepts and Consequences of the Theory of Government and Binding , 1982 .

[66]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[67]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[68]  R. Brown,et al.  A First Language , 1973 .

[69]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[70]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[71]  Geoffrey Sampson,et al.  A test of the leaf-ancestor metric for parse accuracy , 2003, Natural Language Engineering.

[72]  Alessandro Moschitti,et al.  A Study on Convolution Kernels for Shallow Statistic Parsing , 2004, ACL.

[73]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[74]  Ronald M. Kaplan,et al.  Lexical Functional Grammar A Formal System for Grammatical Representation , 2004 .

[75]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[76]  Ted Pedersen,et al.  A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation , 2000, ANLP.

[77]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[78]  T. Klee,et al.  The relation between grammatical development and mean length of utterance in morphemes , 1985, Journal of Child Language.

[79]  M. Garman,et al.  Larsping by numbers , 1988 .

[80]  H. Scarborough Index of Productive Syntax , 1990, Applied Psycholinguistics.

[81]  Alon Lavie,et al.  A Classifier-Based Parser with Linear Run-Time Complexity , 2005, IWPT.

[82]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[83]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .