Word class modeling for speech recognition with out-of-task words using a hierarchical language model

Out-of-vocabulary (OOV) problems are frequently seen when adapting a language model to another task where there are some observed word classes but few individual words, such as names, places and other proper nouns. Simple task adaptation cannot handle this problem properly. In this paper, for task dependent OOV words in the noun category, we adopt a hierarchical language model. In this modeling, the lower class model expressing word phonotactics does not require any additional task dependent corpora for training. It can be trained independent of the upper class model of conventional word class N-grams, as the proposed hierarchical model clearly separates Inter-word characteristics and Intra-word characteristics. This independent-layered training capability makes it possible to apply this model to general vocabularies and tasks in combination with conventional language model adaptation techniques. Speech recognition experiments showed a 19-point increase in word accuracy (from 54% to 73%) in the with-OOV sentences, and comparable accuracy (85%) in the without-OOV sentences, compared with a conventional adapted model. This improvement corresponds to the performance when all OOVs are ideally registered in a dictionary.

[1]  Elmar Nöth,et al.  A category based approach for recognition of out-of-vocabulary words , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Shuntaro Isogai,et al.  Multi-Class Composite N-gram Language Model for Spoken Language Processing Using Multiple Word Clusters , 2001, ACL.

[3]  Steve J. Young,et al.  Class-based language model adaptation using mixtures of word-class weights , 2000, INTERSPEECH.

[4]  Yoshinori Sagisaka,et al.  A hierarchical language model incorporating class-dependent word models for OOV words recognition , 2000, INTERSPEECH.

[5]  Yoshinori Sagisaka,et al.  Spontaneous dialogue speech recognition using cross-word context constrained word graphs , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Tatsuya Kawahara,et al.  Task adaptation using MAP estimation in N-gram language modeling , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  K. Shikano,et al.  Task adaptation in stochastic language models for continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Eiichiro Sumita,et al.  Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World , 2002, LREC.

[9]  Mari Ostendorf,et al.  HMM topology design using maximum likelihood successive state splitting , 1997, Comput. Speech Lang..

[10]  Atsushi Nakamura,et al.  Japanese speech databases for robust speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Haizhou Li,et al.  Building class-based language models with contextual statistics , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).