Results for Variable Speaker and Recording Conditions on Spoken IR in Finnish

The performance of current spoken information retrieval IR systems depend on the success of automatic speech recognition ASR to provide transcripts of the material for indexing. In addition to the ASR system design, ASR performance is strongly affected by the recording conditions, speakers, speaking style and speech content. However, the average word error rate in ASR is not a relevant measure for spoken IR, where only the extracted index terms or keywords matter. In this paper, we measure the spoken IR performance in variable material ranging from controlled single speaker news reading to real-world broadcasts with variable conditions, speakers, and background noise. The effect of using multicondition acoustic models and online adaptation is also studied, as well as controlled addition of background babble noise. The experiments are performed in Finnish, which is an agglutinative and highly inflected language, using morph-based language modelling.

[1]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[2]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[3]  James P. Callan,et al.  Experiments Using the Lemur Toolkit , 2001, TREC.

[4]  Mikko Kurimo,et al.  Speech Transcription and Spoken Document Retrieval in Finnish , 2004, MLMI.

[5]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[6]  VargaAndrew,et al.  Assessment for automatic speech recognition II , 1993 .

[7]  Mikko Kurimo,et al.  Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval , 2008, TSLP.

[8]  Douglas W. Oard,et al.  Phrase-Based Query Degradation Modeling for Vocabulary-Independent Ranked Utterance Retrieval , 2009, HLT-NAACL.

[9]  Douglas W. Oard,et al.  One-sided measures for evaluating ranked retrieval effectiveness with spontaneous conversational speech , 2006, SIGIR '06.

[10]  Ville T. Turunen Reducing the effect of OOV query words by using morph-based spoken document retrieval , 2008, INTERSPEECH.

[11]  Andrei Popescu-Belis,et al.  Machine Learning for Multimodal Interaction , 4th International Workshop, MLMI 2007, Brno, Czech Republic, June 28-30, 2007, Revised Selected Papers , 2008, MLMI.

[12]  Mikko Kurimo,et al.  Importance of High-Order N-Gram Models in Morph-Based Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Bhuvana Ramabhadran,et al.  Vocabulary independent spoken term detection , 2007, SIGIR.