Robustness analysis on lattice-based speech indexing approaches with respect to varying recognition accuracies by refined simulations

We analyze the robustness of different lattice-based speech indexing approaches. While we believe such analysis is important, to our knowledge it has been neglected in prior works. In order to make up for the lack of corpora with various noise characteristics, we use refined approaches to simulate feature vector sequences directly from HMMs, including those with a wide range of recognition accuracies, as opposed to simply adding noise and channel distortion to the existing noisy corpora. We compare, analyze, and discuss the robustness of several state-of-the-art speech indexing approaches.

[1]  M. E. Muller,et al.  A Note on the Generation of Random Normal Deviates , 1958 .

[2]  David Carmel,et al.  Spoken document retrieval from call-center conversations , 2006, SIGIR.

[3]  Kenney Ng,et al.  Subword-based approaches for spoken document retrieval , 2000, Speech Commun..

[4]  Lin-Shan Lee,et al.  Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[5]  Wen Wang,et al.  Building a highly accurate Mandarin speech recognizer , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[6]  James R. Glass,et al.  Open-Vocabulary Spoken Utterance Retrieval using Confusion Networks , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Alex Acero,et al.  Soft indexing of speech content for search in spoken documents , 2007, Comput. Speech Lang..

[8]  Mikko Kurimo,et al.  Indexing confusion networks for morph-based spoken document retrieval , 2007, SIGIR.

[9]  Li Deng,et al.  Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion , 2005, IEEE Transactions on Speech and Audio Processing.

[10]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[11]  Bhuvana Ramabhadran,et al.  Vocabulary independent spoken term detection , 2007, SIGIR.

[12]  Lin-Shan Lee,et al.  Subword-based position specific posterior lattices (s-PSPL) for indexing speech information , 2007, INTERSPEECH.