Performance Analysis for Lattice-Based Speech Indexing Approaches Using Words and Subword Units

Lattice-based speech indexing approaches are attractive for the combination of short spoken segments, short queries, and low automatic speech recognition (ASR) accuracies, as lattices provide recognition alternatives and therefore tend to compensate for recognition errors. Position-specific posterior lattices (PSPLs) and confusion networks (CNs), two of the most popular lattice-based approaches, both reduce disk space requirements and are more efficient than raw lattices. When PSPLs and CNs are used in a word-based fashion, they cannot handle OOV or rare word queries. In this paper, we propose an efficient approach for the construction of subword-based PSPLs (S-PSPLs) and CNs (S-CNs) and present a comprehensive performance analysis of PSPL and CN structures using both words and subword units, taking into account basic principles and structures, and supported by experimental results on Mandarin Chinese. S-PSPLs and S-CNs are shown to yield significant mean average precision (MAP) improvements over word-based PSPLs and CNs for both out-of-vocabulary (OOV) and in-vocabulary queries while requiring much less disk space for indexing.

[1]  Lin-Shan Lee,et al.  Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[2]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[3]  Mikko Kurimo,et al.  Indexing confusion networks for morph-based spoken document retrieval , 2007, SIGIR.

[4]  Lin-Shan Lee,et al.  Improved Large Vocabulary Continuous Chinese Speech Recognition by Character-Based Consensus Networks , 2006, ISCSLP.

[5]  Alex Acero,et al.  Position Specific Posterior Lattices for Indexing Speech , 2005, ACL.

[6]  Beth Logan,et al.  Approaches to reduce the effects of OOV queries on indexed spoken audio , 2005, IEEE Transactions on Multimedia.

[7]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[8]  Bhuvana Ramabhadran,et al.  Vocabulary independent spoken term detection , 2007, SIGIR.

[9]  Peng Yu,et al.  Towards Spoken-Document Retrieval for the Internet: Lattice Indexing For Large-Scale Web-Search Architectures , 2006, NAACL.

[10]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  James R. Glass,et al.  Open-Vocabulary Spoken Utterance Retrieval using Confusion Networks , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Lee-Feng Chien,et al.  PAT-tree-based keyword extraction for Chinese information retrieval , 1997, SIGIR '97.

[13]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[14]  Hermann Ney,et al.  Open vocabulary speech recognition with flat hybrid models , 2005, INTERSPEECH.

[15]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[16]  Satoshi Nakamura,et al.  Generalized posterior probability for minimizing verification errors at subword, word and sentence levels , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[17]  David Carmel,et al.  Spoken document retrieval from call-center conversations , 2006, SIGIR.

[18]  Lin-Shan Lee,et al.  Statistics-based segment pattern lexicon-a new direction for Chinese language modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[19]  Chorkin Chan,et al.  Chinese Word Segmentation based on Maximum Matching and Word Binding Force , 1996, COLING.

[20]  Kenney Ng,et al.  Subword-based approaches for spoken document retrieval , 2000, Speech Commun..

[21]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[22]  Peng Yu,et al.  Vocabulary-independent search in spontaneous speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Frank K. Soong,et al.  Tone-Enhanced Generalized Character Posterior Probability (GCPP) for Cantonese LVCSR , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[24]  Alex Acero,et al.  Soft indexing of speech content for search in spoken documents , 2007, Comput. Speech Lang..

[25]  Lin-Shan Lee,et al.  Robustness analysis on lattice-based speech indexing approaches with respect to varying recognition accuracies by refined simulations , 2008, 2008 IEEE Spoken Language Technology Workshop.

[26]  Lin-Shan Lee,et al.  Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese , 2002, IEEE Trans. Speech Audio Process..

[27]  Peng Yu,et al.  A hybrid word / phoneme-based approach for improved vocabulary-independent search in spontaneous speech , 2004, INTERSPEECH.

[28]  Peng Yu,et al.  Vocabulary-independent indexing of spontaneous speech , 2005, IEEE Transactions on Speech and Audio Processing.

[29]  Andreas Stolcke,et al.  Open-vocabulary spoken term detection using graphone-based hybrid recognition systems , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Lie Lu,et al.  Searching the Audio Notebook: Keyword Search in Recorded Conversation , 2005, HLT.

[31]  Siddika Parlak,et al.  Spoken term detection for Turkish Broadcast News , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Lin-Shan Lee,et al.  Subword-based position specific posterior lattices (s-PSPL) for indexing speech information , 2007, INTERSPEECH.

[33]  Timothy J. Hazen,et al.  Retrieval and browsing of spoken content , 2008, IEEE Signal Processing Magazine.

[34]  Beth Logan,et al.  Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio , 2002 .

[35]  Donna K. Harman,et al.  Overview of the Fourth Text REtrieval Conference (TREC-4) , 1995, TREC.