Shallow Parsing Using Probabilistic Grammatical Inference

This paper presents an application of grammatical inference to the task of shallow parsing. We first learn a deterministic probabilistic automaton that models the joint distribution of Chunk (syntactic phrase) tags and Part-of-speech tags, and then use this automaton as a transducer to find the most likely chunk tag sequence using a dynamic programming algorithm. We discuss an efficient means of incorporating lexical information, which automatically identifies particular words that are useful using a mutual information criterion, together with an application of bagging that improve our results. Though the results are not as high as comparable techniques that use models with a fixed structure, the models we learn are very compact and efficient.

[1]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[2]  Christer Johansson A Context Sensitive Maximum Likelihood Approach to Chunking , 2000, CoNLL/LLL.

[3]  Jean-Pierre Chanod,et al.  Incremental Finite-State Parsing , 1997, ANLP.

[4]  Dana Ron,et al.  On the learnability and usage of acyclic probabilistic finite automata , 1995, COLT '95.

[5]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[6]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Ferran Plà,et al.  An Integrated Statistical Model for Tagging and Chunking Unrestricted Text , 2000, TSD.

[9]  Barak A. Pearlmutter,et al.  Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm , 1998, ICGI.

[10]  Erik F. Tjong Kim Sang,et al.  Noun Phrase Recognition by System Combination , 2000, ANLP.

[11]  Hinrich Schütze,et al.  Part-of-Speech Tagging Using a Variable Memory Markov Model , 1994, ACL.

[12]  Eric Brill,et al.  Bagging and Boosting a Treebank Parser , 2000, ANLP.

[13]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[14]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[15]  Linda C. Smith Review of "Information Retrieval by C. J. van Rijsbergen"; London, Butterworths, 1975 , 1976, SIGF.

[16]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[17]  Andreas Stolcke,et al.  Bayesian learning of probabilistic language models , 1994 .

[18]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[19]  Yuji Matsumoto,et al.  Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[20]  Dana Ron,et al.  Learning probabilistic automata with variable memory length , 1994, COLT '94.

[21]  Franck Thollard Improving Probabilistic Grammatical Inference Core Algorithms with Post-processing Techniques , 2001, ICML.

[22]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..