论文信息 - Effect of pronounciations on OOV queries in spoken term detection

Effect of pronounciations on OOV queries in spoken term detection

The spoken term detection (STD) task aims to return relevant segments from a spoken archive that contain the query terms whether or not they are in the system vocabulary. This paper focuses on pronunciation modeling for Out-of-Vocabulary (OOV) terms which frequently occur in STD queries. The STD system described in this paper indexes word-level and sub-word level lattices or confusion networks produced by an LVCSR system using Weighted Finite State Transducers (WFST).We investigate the inclusion of n-best pronunciation variants for OOV terms (obtained from letter-to-sound rules) into the search and present the results obtained by indexing confusion networks as well as lattices. The following observations are worth mentioning: phone indexes generated from sub-words represent OOVs well and too many variants for the OOV terms degrade performance if pronunciations are not weighted.

[1] Herbert Gish,et al. Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[2] Karen Spärck Jones,et al. Effects of out of vocabulary words in spoken document retrieval (poster session) , 2000, SIGIR '00.

[3] Peng Yu,et al. Vocabulary-independent search in spontaneous speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Bhuvana Ramabhadran,et al. Vocabulary independent spoken term detection , 2007, SIGIR.

[5] Peng Yu,et al. Towards Spoken-Document Retrieval for the Internet: Lattice Indexing For Large-Scale Web-Search Architectures , 2006, NAACL.

[6] Ellen M. Voorhees,et al. The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[7] Johan Schalkwyk,et al. OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[8] Siddika Parlak,et al. Spoken term detection for Turkish Broadcast News , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9] Richard Sproat,et al. Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[10] Beth Logan,et al. Confusion-based query expansion for OOV words in spoken document retrieval , 2002, INTERSPEECH.

[11] Michael Picheny,et al. Improvements in phone based audio search via constrained match with high order confusion estimates , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[12] Fernando Pereira,et al. Weighted Automata in Text and Speech Processing , 2005, ArXiv.

[13] Geoffrey Zweig,et al. The IBM 2004 conversational telephony system for rich transcription , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14] Alex Acero,et al. Position Specific Posterior Lattices for Indexing Speech , 2005, ACL.

[15] Olivier Siohan,et al. Fast vocabulary-independent audio search using path-based graph indexing , 2005, INTERSPEECH.

[16] Mark A. Clements,et al. Phonetic searching applied to on-line distance learning modules , 2002, Proceedings of 2002 IEEE 10th Digital Signal Processing Workshop, 2002 and the 2nd Signal Processing Education Workshop..

[17] Pak-Chung Ching,et al. Query expansion using phonetic confusions for Chinese spoken document retrieval , 2000, IRAL '00.

[18] Cyril Allauzen,et al. General Indexation of Weighted Automata - Application to Spoken Utterance Retrieval , 2004, HLT-NAACL 2004.