Vocabulary independent spoken term detection

We are interested in retrieving information from speech data like broadcast news, telephone conversations and roundtable meetings. Today, most systems use large vocabulary continuous speech recognition tools to produce word transcripts; the transcripts are indexed and query terms are retrieved from the index. However, query terms that are not part of the recognizer's vocabulary cannot be retrieved, and the recall of the search is affected. In addition to the output word transcript, advanced systems provide also phonetic transcripts, against which query terms can be matched phonetically. Such phonetic transcripts suffer from lower accuracy and cannot be an alternative to word transcripts.We present a vocabulary independent system that can handle arbitrary queries, exploiting the information provided by having both word transcripts and phonetic transcripts. A speech recognizer generates word confusion networks and phonetic lattices. The transcripts are indexed for query processing and ranking purpose.The value of the proposed method is demonstrated by the relative high performance ofour system, which received the highest overall ranking for US English speech data in the recent NIST Spoken Term Detection evaluation.

[1]  Sridha Sridharan,et al.  Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Peng Yu,et al.  Vocabulary-independent search in spontaneous speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Karen Spärck Jones,et al.  Open-vocabulary speech indexing for voice and video mail retrieval , 1997, MULTIMEDIA '96.

[4]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[5]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[6]  Dilek Z. Hakkani-Tür,et al.  A general algorithm for word graph matrix decomposition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  Alon Efrat,et al.  Advances in phonetic word spotting , 2001, CIKM '01.

[8]  Karen Spärck Jones,et al.  Effects of out of vocabulary words in spoken document retrieval (poster session) , 2000, SIGIR '00.

[9]  David Carmel,et al.  Juru at TREC 10 - Experiments with Index Pruning , 2001, TREC.

[10]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[11]  Stanley F. Chen,et al.  Conditional and joint models for grapheme-to-phoneme conversion , 2003, INTERSPEECH.

[12]  Beth Logan,et al.  An experimental study of an audio indexing system for the web , 2000, INTERSPEECH.

[13]  Peng Yu,et al.  Fast two-stage vocabulary independent search in spontaneous speech , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  Haim H. Permuter,et al.  Mutual relevance feedback for multimodal query formulation in video retrieval , 2005, MIR '05.

[15]  Alex Acero,et al.  Position Specific Posterior Lattices for Indexing Speech , 2005, ACL.

[16]  David A. James,et al.  A system for unrestricted topic retrieval from radio news broadcasts , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[17]  Geoffrey Zweig,et al.  The IBM 2004 conversational telephony system for rich transcription , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[18]  David Carmel,et al.  Spoken document retrieval from call-center conversations , 2006, SIGIR.

[19]  Kenney Ng,et al.  Subword-based approaches for spoken document retrieval , 2000, Speech Commun..

[20]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[21]  Cyril Allauzen,et al.  General Indexation of Weighted Automata - Application to Spoken Utterance Retrieval , 2004, HLT-NAACL 2004.

[22]  David Anthony James,et al.  The Application of Classical Informa - tion Retrieval Techniques to Spoken Documents , 1995 .

[23]  Alex Acero,et al.  Indexing uncertainty for spoken document search , 2005, INTERSPEECH.

[24]  Amit Singhal,et al.  AT&T at TREC-7 , 1998, TREC.

[25]  Mark A. Clements,et al.  Phonetic searching applied to on-line distance learning modules , 2002, Proceedings of 2002 IEEE 10th Digital Signal Processing Workshop, 2002 and the 2nd Signal Processing Education Workshop..

[26]  Amit Singhal,et al.  Document expansion for speech retrieval , 1999, SIGIR '99.