Statistical parsing and language modeling based on constraint dependency grammar

This thesis focuses on the development of effective and efficient language models (LMs) for speech recognition systems. We selected Constraint Dependency Grammar (CDG) as the underlying framework because CDG parses can be lexicalized at the word level with a rich set of lexical features for modeling subcategorization and wh-movement without a combinatorial explosion of the parameter space and because CDG is able to model languages with crossing dependencies and free word ordering. Two types of LMs were developed: an almost-parsing LM and a full parser-based LM The quality of these LMs gained significantly from the insights obtained from initial CDG grammar induction experiments. The almost-parsing LM uses a data structure derived from CDG parses called a SuperARV that tightly integrates knowledge of words, lexical features, and syntactic constraints. The full CDG parser-based LM utilizes complete parse information obtained by adding the modifiee links to the SuperARVs assigned to each word in a sentence in order to capture important long-distance dependency constraints. We have evaluated the almost-parsing LM on a variety of large vocabulary continuous speech recognition (LVCSR) tasks and found that it reduced recognition error rates significantly compared to commonly used word-based LMs, achieving performance competitive to state-of-the-art parser-based LMs with a significantly lower time complexity. The full CDG parser-based LM, when evaluated on the DARPA Wall Street Journal CSR task, outperformed the almost-parsing LM and produced a performance comparable to or exceeding the state-of-the-art parser-based LMs.

[1]  Ciprian Chelba,et al.  Exploiting Syntactic Structure for Natural Language Modeling , 2000, ArXiv.

[2]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[3]  Daniel Kayser,et al.  Construction of Natural Language Sentence Acceptors by a Supervised-Learning Technique , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Mary P. Harper,et al.  PARSEC: A Constraint-Based Parser for Spoken Language Processing , 1993 .

[5]  Yifan Gong,et al.  The importance of segmentation probability in segment based speech recognizers , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[7]  J. Baker Trainable grammars for speech recognition , 1979 .

[8]  Srinivas Bangalore,et al.  The Institute For Research In Cognitive Science Disambiguation of Super Parts of Speech ( or Supertags ) : Almost Parsing by Aravind , 1995 .

[9]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Peter A. Heeman,et al.  POS Tagging versus Classes in Language Modeling , 1998, VLC@COLING/ACL.

[11]  J. Pollack The Induction of Dynamical Recognizers , 1996, Machine Learning.

[12]  Eric Brill,et al.  Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach , 1993, ACL.

[13]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[14]  Hiroshi Maruyama Constraint Dependency Grammar and Its Weak Generative Capacity , 1992 .

[15]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[16]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[17]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[18]  Eric Brill,et al.  Automatically Acquiring Phrase Structure Using Distributional Analysis , 1992, HLT.

[19]  Wolfgang Menzel,et al.  Robust Processing of Natural Language , 1995, KI.

[20]  Anna Korhonen,et al.  Automatic Extraction of Subcategorization Frames from Corpora -improving Filtering with Diathesis Alternations , 1998 .

[21]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[22]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[23]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[24]  Mary P. Harper,et al.  Managing Multiple Knowledge Sources in Constraint-Based Parsing of Spoken Language , 1995, Fundam. Informaticae.

[25]  Wen Wang,et al.  Rescoring effectiveness of language models using different levels of knowledge and their integration , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Donald Hindle,et al.  Acquiring Disambiguation Rules from Text , 1989, ACL.

[27]  Rens Bod,et al.  Using an Annotated Corpus as a Stochastic Grammar , 1993, EACL.

[28]  Providen e RIe Immediate-Head Parsing for Language Models , 2001 .

[29]  J. Cleary,et al.  \self-organized Language Modeling for Speech Recognition". In , 1997 .

[30]  Mary P. Harper,et al.  The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources , 2002, EMNLP.

[31]  Srinivas Bangalore,et al.  Complexity of lexical descriptions and its relevance to partial parsing , 1997 .

[32]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[33]  Hiroshi Maruyama,et al.  Structural Disambiguation With Constraint Propagation , 1990, ACL.

[34]  Mark Steedman,et al.  Building Deep Dependency Structures using a Wide-Coverage CCG Parser , 2002, ACL.

[35]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[36]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[37]  Frederick Jelinek,et al.  Basic Methods of Probabilistic Context Free Grammars , 1992 .

[38]  Joshua Goodman Efficient Algorithms for Parsing the DOP Model , 1996, EMNLP.

[39]  Elmar Nöth,et al.  Dovetailing of acoustics and prosody in spontaneous speech recognition , 1998, ICSLP.

[40]  Ted Briscoe,et al.  Robust stochastic parsing using the inside-outside algorithm , 1994, ArXiv.

[41]  Mary P. Harper,et al.  NEAR MINIMAL WEIGHTED WORD GRAPHS FOR POST-PROCESSING SPEECH , 1999 .

[42]  Johannes Heinecke,et al.  Eliminative Parsing with Graded Constraints , 1998, COLING-ACL.

[43]  Eva Haji The Prague Dependency Treebank: Crossing the Sentence Boundary , 1998 .

[44]  Mary P. Harper,et al.  A Second-Order Hidden Markov Model for Part-of-Speech Tagging , 1999, ACL.

[45]  Yasubumi Sakakibara,et al.  Efficient Learning of Context-Free Grammars from Positive Structural Examples , 1992, Inf. Comput..

[46]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[47]  Raymond J. Mooney,et al.  Inducing Deterministic Prolog Parsers from Treebanks: A Machine Learning Approach , 1994, AAAI.

[48]  Aravind K. Joshi,et al.  Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[49]  Ralph Grishman,et al.  The Comlex Syntax Project: The First Year , 1994, HLT.

[50]  Mary P. Harper,et al.  Using explicit segmentation to improve HMM phone recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[51]  Joshi Vijay-Shanker The Relation between Tree{adjoining Grammars and Constraint Dependency Grammars , 1997 .

[52]  Petr Sgall,et al.  The Meaning Of The Sentence In Its Semantic And Pragmatic Aspects , 1986 .

[53]  Dietrich Klakow,et al.  COMPACT MAXIMUM ENTROPY LANGUAGE MODELS , 1999 .

[54]  Rebecca Hwa Supervised Grammar Induction using Training Data with Limited Constituent Information , 1999, ACL.

[55]  Aravind K. Joshi,et al.  Some Computational Properties of Tree Adjoining Grammars , 1985, Annual Meeting of the Association for Computational Linguistics.

[56]  Wayne H. Ward Understanding spontaneous speech: the Phoenix system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[57]  Lillian Lee,et al.  Learning of Context-Free Languages: A Survey of the Literature , 1996 .

[58]  Thomas Niesler,et al.  A variable-length category-based n-gram language model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[59]  John Lafferty,et al.  Grammatical Trigrams: A Probabilistic Model of Link Grammar , 1992 .

[60]  Hiyan Alshawi,et al.  Training and Scaling Preference Functions for Disambiguation , 1994, Comput. Linguistics.

[61]  Richard Hudson,et al.  English word grammar , 1995 .

[63]  Daniel J. Rosenkrantz,et al.  Deterministic Left Corner Parsing (Extended Abstract) , 1970, SWAT.

[64]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[65]  Thomas Niesler,et al.  Comparative evaluation of word- and category-based language models , 1996 .

[66]  David J. Weir,et al.  Characterizing Structural Descriptions Produced by Various Grammatical Formalisms , 1987, ACL.

[67]  Jerome R. Bellegarda Multi-Span statistical language modeling for large vocabulary speech recognition , 1998, ICSLP.

[68]  Kurt VanLehn,et al.  A Version Space Approach to Learning Context-free Grammars , 1987, Machine Learning.

[69]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[70]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[71]  Andreas Stolcke,et al.  Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures , 2003, NAACL.

[72]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[73]  Mitchell P. Marcus,et al.  Pearl: A Probabilistic Chart Parser , 1991, EACL.

[74]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[75]  R. A. Sharman,et al.  Generating a grammar for statistical training , 1990, HLT.

[76]  James F. Allen,et al.  Combining the detection and correction of speech repairs , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[77]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[78]  Peng Xu,et al.  A Study on Richer Syntactic Dependencies for Structured Language Modeling , 2002, ACL.

[79]  Dekai Wu,et al.  Learning restricted probabilistic link grammars , 1995, Learning for Natural Language Processing.

[80]  Dekang Lin,et al.  A dependency-based method for evaluating broad-coverage parsers , 1995, Natural Language Engineering.

[81]  Alexander H. Waibel,et al.  Learning complex output representations in connectionist parsing of spoken language , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[82]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[83]  Jeremy J. Carroll,et al.  Automatic Learning for Semantic Collocation , 1992, ANLP.

[84]  Jason Eisner Bilexical Grammars and a Cubic-time Probabilistic Parser , 1997, IWPT.

[85]  William H. Press,et al.  Numerical recipes in C , 2002 .

[86]  Mary P. Harper,et al.  Rapid grammar development and parsing: constraint dependency grammars with abstract role values , 2000 .

[87]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[88]  William A. Gale,et al.  Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[89]  Elmar Nöth,et al.  Integrated recognition of words and phrase boundaries , 1998, ICSLP.

[90]  Thomas Niesler,et al.  Category-Based Statistical Language Models , 1997 .

[91]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[92]  Julian Kupiec,et al.  Probabilistic Models of Short and Long Distance Word Dependencies in Running Text , 1989, HLT.

[93]  Fernando Pereira,et al.  Relating Probabilistic Grammars and Automata , 1999, ACL.

[94]  Atro Voutilainen,et al.  Inducing constraint grammars , 1996, ICGI.

[95]  Donald Hindle,et al.  Deterministic Parsing of Syntactic Non-fluencies , 1983, ACL.

[96]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[97]  Wolfgang Menzel,et al.  Parsing of Spoken Language under Time Constraints , 1994, ECAI.

[98]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[99]  Ralph Grishman,et al.  Statistical Parsing of Messages , 1990, HLT.

[100]  Michael C. Mozer,et al.  SLUG: A connectionist architecture for inferring the structure of finite-state environments , 2004, Machine Learning.

[101]  Steve Young,et al.  The HTK book , 1995 .

[102]  J. Feldman,et al.  Learning Automata from Ordered Examples , 1991 .

[103]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[104]  Mary P. Harper,et al.  Log Time Parsing on the MasPar MP-1 , 1992, ICPP.

[105]  Eric K. Ringger,et al.  Augmenting words with linguistic information for n-gram language models , 1999, EUROSPEECH.

[106]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[107]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[108]  Timo Järvinen,et al.  Towards an implementable dependency grammar , 1998, ArXiv.

[109]  Yorick Wilks,et al.  Compacting the Penn Treebank Grammar , 1998, ACL.

[110]  Kilian A. Foth,et al.  Modeling dependency grammar with restricted constraints , 2000 .

[111]  John D. Lafferty,et al.  Towards History-based Grammars: Using Richer Models for Probabilistic Parsing , 1993, ACL.

[112]  Robert C. Berwick,et al.  Learning Syntax by Automata Induction , 1987, Machine Learning.

[113]  V. Kubon,et al.  Two Useful Measures of Word Order Complexity , 1998, Workshop On Processing Of Dependency-Based Grammars.

[114]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[115]  Mark Johnson The effect of alternative tree epresentatmns on tree bank grammars , 1998, CoNLL.

[116]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[117]  John D. Lafferty,et al.  Development and Evaluation of a Broad-Coverage Probabilistic Grammar of English-Language Computer Manuals , 1992, ACL.

[118]  Ronald Rosenfeld,et al.  The CMU Statistical Language Modeling Toolkit and its use in the 1994 ARPA CSR Evaluation , 1995 .

[119]  Andreas Stolcke,et al.  Word predictability after hesitations: a corpus-based study , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[120]  Eugene Charniak,et al.  Edit Detection and Parsing for Transcribed Speech , 2001, NAACL.

[121]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[122]  John D. Lafferty,et al.  Decision Tree Parsing using a Hidden Derivation Model , 1994, HLT.

[123]  Raymond J. Mooney,et al.  Learning Semantic Grammars with Constructive Inductive Logic Programming , 1993, AAAI.

[124]  Wen Wang,et al.  The Effectiveness of Corpus-Induced Dependency Grammars for Post-processing Speech , 2000, ANLP.

[125]  Jeff A. Bilmes,et al.  Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[126]  Raymond J. Mooney,et al.  Learning Parse and Translation Decisions from Examples with Rich Context , 1997, ACL.

[127]  Yves Schabes,et al.  Parsing the Wall Street Journal with the Inside-Outside Algorithm , 1993, EACL.

[128]  Mary P. Harper,et al.  Approaches for Learning Constraint Dependency Grammar from Corpora , 2001 .

[129]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[130]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[131]  Jason Eisner,et al.  An Empirical Comparison of Probability Models for Dependency Grammar , 1997, ArXiv.

[132]  Philip Resnik,et al.  Probabilistic Tree-Adjoining Grammar as a Framework for Statistical Natural Language Processing , 1992, COLING.

[133]  Roland Kuhn,et al.  Speech Recognition and the Frequency of Recently Used Words: A Modified Markov Model for Natural Language , 1988, COLING.

[134]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[135]  Ted Briscoe,et al.  Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars , 1993, CL.

[136]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[137]  Mark Johnson,et al.  Joint and Conditional Estimation of Tagging and Parsing Models , 2001, ACL.

[138]  Helmut Lucke Reducing the computational complexity for inferring stochastic context-free grammar rules from example text , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[139]  Mari Ostendorf,et al.  Variable n-grams and extensions for conversational speech language modeling , 2000, IEEE Trans. Speech Audio Process..

[140]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[141]  Mary P. Harper,et al.  Extensions to constraint dependency parsing for spoken language processing , 1995, Comput. Speech Lang..

[142]  Yves Schabes,et al.  Stochastic Lexicalized Tree-adjoining Grammars , 1992, COLING.

[143]  B. Srinivas "Almost parsing" technique for language modeling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[144]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[145]  Pascale Fung,et al.  The estimation of powerful language models from small and large corpora , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[146]  Robert F. Simmons,et al.  The Acquisition and Use of Context-Dependent Grammars for English , 1992, Comput. Linguistics.

[147]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[148]  Frederick Jelinek,et al.  Improved clustering techniques for class-based statistical language modeling , 1999 .

[149]  Eric Brill,et al.  A Rule-Based Approach to Prepositional Phrase Attachment Disambiguation , 1994, COLING.

[150]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[151]  Joshua Goodman,et al.  Probabilistic Feature Grammars , 1997, IWPT.

[152]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[153]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[154]  Frederick Jelinek,et al.  Recognition performance of a structured language model , 2000, EUROSPEECH.

[155]  Mary P. Harper,et al.  Enhanced Constraint Dependency Grammar Parsers , 1998 .