Spoken Term Detection Based on Improved Index Structure

The performance of keyword spotting system suffers severe degradation when the index stage is so fast that the lattice may lose lots of information to retrieve the spoken terms. In this paper, we focus on this problem and present two algorithm: the first one called unconstraint word graph expansion (UWGE) and the other called dynamic position specific posterior lattice(D-PSPL). The motivation of these methods is to keep the pruned hypotheses which are discarded in the decoding procedure but may contain correct hypotheses. The proposed approaches is to eliminate the N-gram language model state limitation of lattice and reconstruct lattice to unconstrained word graph. On two Mandarin conversation telephone speech sets, we compare performance using the two methods with that on traditional trigram lattice, and our approaches give satisfying performance gains over trigram lattice. The experiment results also show that the D-PSPL algorithm is better than the UWGE algorithm in high score area.

[1]  Peng Yu,et al.  Vocabulary-independent indexing of spontaneous speech , 2005, IEEE Transactions on Speech and Audio Processing.

[2]  Lukás Burget,et al.  Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.

[3]  Thomas Schaaf,et al.  Estimating confidence using word lattices , 1997, EUROSPEECH.

[4]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[5]  Yonghong Yan,et al.  A One-Pass Real-Time Decoder Using Memory-Efficient State Network , 2008, IEICE Trans. Inf. Syst..

[6]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[7]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[8]  Daniel Schneider,et al.  Efficient subword lattice retrieval for German spoken term detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Hermann Ney,et al.  The time-conditioned approach in dynamic programming search for LVCSR , 2000, IEEE Trans. Speech Audio Process..

[10]  Hermann Ney,et al.  Progress in dynamic programming search for LVCSR , 2000 .

[11]  Hermann Ney,et al.  Using posterior word probabilities for improved speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  Yu Shi,et al.  Towards spoken-document retrieval for the enterprise: Approximate word-lattice indexing with text indexers , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[13]  Hui Jiang,et al.  Confidence measures for speech recognition: A survey , 2005, Speech Commun..

[14]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..