Word-lattice based spoken-document indexing with standard text indexers

Indexing the spoken content of audio recordings requires automatic speech recognition, which is as of today not reliable. Unlike indexing text, we cannot reliably know from a speech recognizer whether a word is present at a given point in the audio; we can only obtain a probability for it. Correct use of these probabilities significantly improves spoken-document search accuracy. In this paper, we will first describe how to improve accuracy for "web-search style" (AND/phrase) queries into audio, by utilizing speech recognition alternates and word posterior probabilities based on word lattices. Then, we will present an end-to-end approach to doing so using standard text indexers, which by design cannot handle probabilities and unaligned alternates. We present a sequence of approximations that transform the numeric lattice-matching problem into a symbolic text-based one that can be implemented by a commercial full-text indexer. Experiments on a 170-hour lecture set show an accuracy improvement by 30-60% for phrase searches and by 130% for two-term AND queries, compared to indexing linear text.

[1]  James H. Martin,et al.  Speech and Language Processing An Introduction to Natural Language Processing , Computational Linguistics , and Speech Recognition Second Edition , 2008 .

[2]  Beth Logan,et al.  Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio , 2002 .

[3]  Mark A. Clements,et al.  Phonetic Searching vs. LVCSR: How to Find What You Really Want in Audio Archives , 2002, Int. J. Speech Technol..

[4]  Biing-Hwang Juang,et al.  Combining key-phrase detection and subword-based verification for flexible speech understanding , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[6]  Kenney Ng,et al.  Subword-based approaches for spoken document retrieval , 2000, Speech Commun..

[7]  Cyril Allauzen,et al.  General Indexation of Weighted Automata - Application to Spoken Utterance Retrieval , 2004, HLT-NAACL 2004.

[8]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[9]  Peng Yu,et al.  A hybrid word / phoneme-based approach for improved vocabulary-independent search in spontaneous speech , 2004, INTERSPEECH.

[10]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[11]  Peng Yu,et al.  Towards Spoken-Document Retrieval for the Internet: Lattice Indexing For Large-Scale Web-Search Architectures , 2006, NAACL.

[12]  Peng Yu,et al.  Vocabulary-independent indexing of spontaneous speech , 2005, IEEE Transactions on Speech and Audio Processing.

[13]  Jing Huang,et al.  Automatic speech recognition performance on a voicemail transcription task , 2002, IEEE Trans. Speech Audio Process..

[14]  Xunying Liu,et al.  Development of the 2004 CU-HTK English CTS systems using more than two thousand hours of data , 2004 .

[15]  Hermann Ney,et al.  Using posterior word probabilities for improved speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16]  Peng Yu,et al.  Vocabulary-independent search in spontaneous speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Alex Acero,et al.  Position Specific Posterior Lattices for Indexing Speech , 2005, ACL.

[18]  James R. Glass,et al.  Analysis and Processing of Lecture Audio Data: Preliminary Investigations , 2004, Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004 - SpeechIR '04.

[19]  Yu Shi,et al.  Towards spoken-document retrieval for the enterprise: Approximate word-lattice indexing with text indexers , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[20]  Steve J. Young,et al.  A fast lattice-based approach to vocabulary independent wordspotting , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.