论文信息 - Topic Detection for Language Model Adaptation of Highly-Inflected Languages by Using a Fuzzy Comparison Function

Topic Detection for Language Model Adaptation of Highly-Inflected Languages by Using a Fuzzy Comparison Function

A new framework is proposed to construct corpus-based topicadapted language models for large vocabulary speech recognition of highly-inflected Slovenian language. The proposed techniques can be applied to other Slavic languages, where words are formed by many different inflectional affixatation. In this article an attempt to overcome two important difficulties of highly-inflected languages (high out-of-vocabulary rate and the problem of topic detection) is described. The first problem is solved by the decomposition of words into stems and endings, and topic detection is improved by a novel approach for feature extraction based on soft comparison of words. The results of experiments on the second largest Slovenian newspaper news

Mirjam Sepesy Mau | Zdravko Ka

[1] Ronald Rosenfeld,et al. Using story topics for language model adaptation , 1997, EUROSPEECH.

[2] Ronald Rosenfeld,et al. Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[3] Philip Clarkson,et al. The applicability of adaptive language modelling for the broadcast news task , 1998, ICSLP.

[4] Philip C. Woodland,et al. Comparison of language modelling techniques for Russian and English , 1998, ICSLP.

[5] Mari Ostendorf,et al. Transforming out-of-domain estimates to improve in-domain language models , 1997, EUROSPEECH.

[6] Lluís Padró,et al. A Flexible POS Tagger Using an Automatically Acquired Language Model , 1997, ACL.

[7] Thorsten Joachims,et al. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[8] Dietrich Klakow,et al. Selecting articles from the language model training corpus , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).