Performance Analysis and Improvement of Turkish Broadcast News Retrieval

This paper presents our work on the retrieval of spoken information in Turkish. Traditional speech retrieval systems perform indexing and retrieval over automatic speech recognition (ASR) transcripts, which include errors either because of out-of-vocabulary (OOV) words or ASR inaccuracy. We use subword units as recognition and indexing units to reduce the OOV rate and index alternative recognition hypotheses to handle ASR errors. Performance of such methods is evaluated on our Turkish Broadcast News Corpus with two types of speech retrieval systems: a spoken term detection (STD) and a spoken document retrieval (SDR) system. To evaluate the SDR system, we also build a spoken information retrieval (IR) collection, which is the first for Turkish. Experiments showed that word segmentation algorithms are quite useful for both tasks. SDR performance is observed to be less dependent on the ASR component, whereas any performance change in ASR directly affects STD. We also present extensive analysis of retrieval performance depending on query length, and propose length-based index combination and thresholding strategies for the STD task. Finally, a new approach, which depends on the detection of stems instead of complete terms, is tried for STD and observed to give promising results. Although evaluations were performed in Turkish, we expect the proposed methods to be effective for similar languages as well.

[1]  Bhuvana Ramabhadran,et al.  Effect of pronounciations on OOV queries in spoken term detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Timothy J. Hazen,et al.  Retrieval and browsing of spoken content , 2008, IEEE Signal Processing Magazine.

[3]  Cyril Allauzen,et al.  General Indexation of Weighted Automata - Application to Spoken Utterance Retrieval , 2004, HLT-NAACL 2004.

[4]  Brian Kingsbury,et al.  Fast decoding for open vocabulary spoken term detection , 2009, HLT-NAACL.

[5]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[6]  James R. Glass,et al.  Open-Vocabulary Spoken Utterance Retrieval using Confusion Networks , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Mikko Kurimo,et al.  Indexing confusion networks for morph-based spoken document retrieval , 2007, SIGIR.

[8]  Ross Wilkinson,et al.  Experiments in spoken document retrieval using phoneme n-grams , 2000, Speech Commun..

[9]  Hwee Tou Ng,et al.  A lattice-based approach to query-by-example spoken document retrieval , 2008, SIGIR '08.

[10]  Ebru Arisoy,et al.  Language modeling for automatic turkish broadcast news transcription , 2007, INTERSPEECH.

[11]  Martha Larson,et al.  Using syllable-based indexing features and language models to improve German spoken document retrieval , 2003, INTERSPEECH.

[12]  Mathias Creutz,et al.  Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[13]  Hsin-Min Wang,et al.  Experiments in syllable-based retrieval of broadcast news speech in Mandarin Chinese , 2000, Speech Commun..

[14]  Daniel Schneider,et al.  Efficient subword lattice retrieval for German spoken term detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Dong Wang,et al.  A comparison of phone and grapheme-based spoken term detection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Siddika Parlak,et al.  Spoken information retrieval for turkish broadcast news , 2009, SIGIR.

[17]  Bhuvana Ramabhadran,et al.  Vocabulary independent spoken term detection , 2007, SIGIR.

[18]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[19]  Mathias Creutz,et al.  INDUCING THE MORPHOLOGICAL LEXICON OF A NATURAL LANGUAGE FROM UNANNOTATED TEXT , 2005 .

[20]  M. Saraclar,et al.  Comparison of language modeling approaches for Turkish Broadcast News , 2008, 2008 IEEE 16th Signal Processing, Communication and Applications Conference.

[21]  Lin-Shan Lee,et al.  Improved spoken document retrieval by exploring extra acoustic and linguistic cues , 2001, INTERSPEECH.

[22]  Victor Zue,et al.  Subword unit representations for spoken document retrieval , 1997, EUROSPEECH.

[23]  Lin-Shan Lee,et al.  Performance Analysis for Lattice-Based Speech Indexing Approaches Using Words and Subword Units , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Seiichi Nakagawa,et al.  Comparing isolately spoken keywords with spontaneously spoken queries for Japanese spoken document retrieval , 2002, INTERSPEECH.

[25]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[26]  Arnaud Sahuguet,et al.  An audio indexing system for election video material , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Hao Tang,et al.  Spoken term detection from bilingual spontaneous speech using code-switched lattice-based structures for words and subword units , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[28]  Ebru Arisoy,et al.  Turkish Broadcast News Transcription and Retrieval , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Siddika Parlak,et al.  Spoken term detection for Turkish Broadcast News , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Lale Akarun,et al.  Speech and sliding text aided sign retrieval from hearing impaired sign news videos , 2007, Journal on Multimodal User Interfaces.

[31]  Hsin-Min Wang,et al.  Syllable-Based Chinese Text/Spoken Document Retrieval Using Text/Speech Queries , 2000, Int. J. Pattern Recognit. Artif. Intell..

[32]  Mikko Kurimo,et al.  To recover from speech recognition errors in spoken document retrieval , 2005, INTERSPEECH.

[33]  Seiichi Nakagawa,et al.  Japanese spoken document retrieval considering OOV keywords using LVCSR system with OOV detection processing , 2002 .

[34]  Thomas Sikora,et al.  Phonetic confusion based document expansion for spoken document retrieval , 2004, INTERSPEECH.

[35]  Mikko Kurimo,et al.  An evaluation of a spoken document retrieval baseline system in finish , 2004, INTERSPEECH.

[36]  Andreas Stolcke,et al.  The SRI/OGI 2006 spoken term detection system , 2007, INTERSPEECH.

[37]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[38]  Murat Saraclar,et al.  Resources for Turkish morphological processing , 2011, Lang. Resour. Evaluation.

[39]  Fazli Can,et al.  Information retrieval on Turkish texts , 2008, J. Assoc. Inf. Sci. Technol..

[40]  Herbert Gish,et al.  Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[41]  James R. Glass,et al.  Recent progress in the MIT spoken lecture processing project , 2007, INTERSPEECH.

[42]  Alex Acero,et al.  Position Specific Posterior Lattices for Indexing Speech , 2005, ACL.

[43]  Bhuvana Ramabhadran,et al.  Web derived pronunciations for spoken term detection , 2009, SIGIR.

[44]  Jonathan G. Fiscus,et al.  Results of the 2006 Spoken Term Detection Evaluation , 2006 .