论文信息 - Using story topics for language model adaptation

Using story topics for language model adaptation

The subject matter of any conversation or document can typically be described as some combination of elemental topics. We have developed a language model adaptation scheme that takes a piece of text, chooses the most similar topic clusters from a set of over 5000 elemental topics, and uses topic specific language models built from the topic clusters to rescore N-best lists. We are able to achieve a 15% reduction in perplexity and a small improvement in WER by using this adaptation. We also investigate the use of a topic tree, where the amount of training data for a specific topic can be judiciously increased in cases where the elemental topic cluster has too few word tokens to build a reliably smoothed and representative language model. Our system is able to fine-tune topic adaptation by interpolating models chosen from thousands of topics, allowing for adaptation to unique, previously unseen combinations of subjects.

Ronald Rosenfeld | Kristie Seymore | Ronald Rosenfeld | K. Seymore | R. Rosenfeld

[1] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[2] G Salton,et al. Developments in Automatic Text Retrieval , 1991, Science.

[3] Mari Ostendorf,et al. Modeling long distance dependence in language: topic mixtures vs. dynamic cache models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4] Beth A. Carlson. Unsupervised topic clustering of switchboard speech messages , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5] Ronald Rosenfeld,et al. Large-Scale Topic Detection and Language Model Adaptation. , 1997 .

[6] Richard M. Stern,et al. The 1996 Hub-4 Sphinx-3 System , 1997 .

[7] Anthony J. Robinson,et al. Language model adaptation using mixtures and an exponentially decaying cache , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Stanley F. Chen,et al. Language and Pronunciation Modeling in the CMU 1996 Hub 4 Evaluation , 1999 .