POS Tags and Decision Trees for Language Modeling

Language models for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. To use POS tags effectively, we use clustering and decision tree algorithms, which allow generalizations between POS tags and words to be effectively used in estimating the probability distributions. We show that our POS model gives a reduction in word error rate and perplexity for the Trains corpus in comparison to word and class-based approaches. By using the Wall Street Journal corpus, we show that this approach scales up when more training data is available.

[1]  F. Jelinek,et al.  Perplexity—a measure of the difficulty of speech recognition tasks , 1977 .

[2]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[3]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[4]  Richard M. Schwartz,et al.  The N-Best Algorithm: Efficient Procedure for Finding Top N Sentence Hypotheses , 1989, HLT.

[5]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[7]  John D. Lafferty,et al.  Towards History-based Grammars: Using Richer Models for Probabilistic Parsing , 1993, ACL.

[8]  Eugene Charniak,et al.  Equations for Part-of-Speech Tagging , 1993, AAAI.

[9]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[10]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[11]  Ronald Rosenfeld,et al.  The CMU Statistical Language Modeling Toolkit and its use in the 1994 ARPA CSR Evaluation , 1995 .

[12]  Thomas Niesler,et al.  A variable-length category-based n-gram language model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13]  James F. Allen,et al.  Incorporating POS tagging into language modeling , 1997, EUROSPEECH.

[14]  Peter A. Heeman,et al.  Speech Repairs, Intonational Boundaries and Discourse Markers: Modeling Speakers' Utterances in Spoken Dialog , 1997, ArXiv.

[15]  J. Cleary,et al.  \self-organized Language Modeling for Speech Recognition". In , 2022 .

[16]  Peter A. Heeman,et al.  POS Tagging versus Classes in Language Modeling , 1998, VLC@COLING/ACL.

[17]  Yonghong Yan,et al.  Development Of Cslu Lvcsr: The 1997 Darpa Hub4 Evaluation System , 1998 .

[18]  Yonghong Yan,et al.  Development of the 1998 OGI-FONIX broadcast news transcription system , 1999, EUROSPEECH.

[19]  James F. Allen,et al.  Speech repains, intonational phrases, and discourse markers: modeling speakers’ utterances in spoken dialogue , 1999, CL.