论文信息 - To recover from speech recognition errors in spoken document retrieval

To recover from speech recognition errors in spoken document retrieval

An important difference between the retrieval of spoken and written documents is that the indexing of the speech data is usually based on automatic speech transcripts that contain recognition errors. However, there are several ways of reducing the effect of incorrect index terms in the retrieval. This paper presents retrieval experiments with unlimited vocabulary speech recognizer that utilizes a lexicon of unsupervised morpheme-like units. Based on this recognizer, three different methods are evaluated for error recovery. First, the recognized words are expanded by adding the recognized morphemes, too. Second, the words are expanded by adding the best rival morpheme candidates that were pruned away by the recognizer. Third, the queries are expanded by the potentially relevant terms found from text documents, which were retrieved from parallel text corpora by the original queries. The best results are obtained by that latter method which significantly improves the precision compared to the original queries and brings the spoken document retrieval precision to the same level as the corresponding text document retrieval.

Mikko Kurimo | Ville T. Turunen

[1] Eero Sormunen,et al. A Method for Measuring Wide Range Performance of Boolean Queries in Full-Text Databases , 2000 .

[2] Kevin Barraclough,et al. I and i , 2001, BMJ : British Medical Journal.

[3] Mathias Creutz,et al. Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[4] Mikko Kurimo,et al. Unlimited vocabulary speech recognition with morph language models applied to Finnish , 2006, Comput. Speech Lang..

[5] Mikko Kurimo,et al. An evaluation of a spoken document retrieval baseline system in finish , 2004, INTERSPEECH.

[6] Steve Renals,et al. Indexing and retrieval of broadcast news , 2000, Speech Commun..

[7] Mikko Kurimo,et al. On lexicon creation for turkish LVCSR , 2003, INTERSPEECH.

[8] Mikko Kurimo,et al. Unlimited vocabulary speech recognition based on morphs discovered in an unsupervised manner , 2003, INTERSPEECH.

[9] Ellen M. Voorhees,et al. The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.