POS Tagging versus Classes in Language Modeling

Language models for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. The use of POS tags allows more sophisticated generalizations than are afforded by using a class-based approach. Furthermore, if we want to incorporate speech repair and intonational phrase modeling into the language model, using POS tags rather than classes gives better performance in this task.

[1]  James F. Allen,et al.  Incorporating POS tagging into language modeling , 1997, EUROSPEECH.

[2]  John Bear,et al.  Integrating Multiple Knowledge Sources for Detection and Correction of Repairs in Human-Computer Dialog , 1992, ACL.

[3]  J. Cleary,et al.  \self-organized Language Modeling for Speech Recognition". In , 1997 .

[4]  Thomas Niesler,et al.  A variable-length category-based n-gram language model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  F. Jelinek,et al.  Perplexity—a measure of the difficulty of speech recognition tasks , 1977 .

[6]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[7]  Géraldine Damnati,et al.  Deriving phrase-based language models , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[8]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[9]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[10]  Hervé Bourlard,et al.  Continuous speech recognition , 1995, IEEE Signal Process. Mag..

[11]  Peter A. Heeman,et al.  Speech Repairs, Intonational Boundaries and Discourse Markers: Modeling Speakers' Utterances in Spoken Dialog , 1997, ArXiv.

[12]  Richard M. Schwartz,et al.  The N-Best Algorithm: Efficient Procedure for Finding Top N Sentence Hypotheses , 1989, HLT.

[13]  Chung Hee Hwang,et al.  The TRAINS project: a case study in building a conversational planning agent , 1994, J. Exp. Theor. Artif. Intell..

[14]  John D. Lafferty,et al.  Towards History-based Grammars: Using Richer Models for Probabilistic Parsing , 1993, ACL.

[15]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[16]  Eugene Charniak,et al.  Equations for Part-of-Speech Tagging , 1993, AAAI.

[17]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[18]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[19]  Ronald Rosenfeld,et al.  The CMU Statistical Language Modeling Toolkit and its use in the 1994 ARPA CSR Evaluation , 1995 .

[20]  James F. Allen,et al.  Intonational Boundaries, Speech Repairs, and Discourse Markers: Modeling Spoken Dialog , 1997, ACL.

[21]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.