论文信息 - Semantic cache model driven speech recognition

Semantic cache model driven speech recognition

This paper proposes an improved semantic based cache model: our method boils down to using the first pass of the ASR system, associated to confidence scores and semantic fields, for driving the second pass. In previous papers, we had introduced a Driven Decoding Algorithm (DDA), which allows us to combine speech recognition systems, by guiding the search algorithm of a primary ASR system by the one-best hypothesis of an auxiliary system. We propose a strategy using DDA to drive a semantic cache, according to the confidence measures. The combination between semantic-cache and DDA optimizes the new decoding process, like an unsupervised language model adaptation. Experiments evaluate the proposed method on 8 hours of speech. Results show that semantic-DDA yields significant improvements to the baseline: we obtain a 4% word error rate relative improvement without acoustic adaptation, and 1.9% after adaptation with a 3xRT ASR system.

Georges Linarès | Benjamin Lecouteux | Pascal Nocera

[1] Ciro Martins,et al. Dynamic language modeling for a daily broadcast news transcription system , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[2] J.R. Bellegarda,et al. Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[3] Anthony J. Robinson,et al. Language model adaptation using mixtures and an exponentially decaying cache , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Berlin Chen,et al. Word Topical Mixture Models for Dynamic Language Model Adaptation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5] Georges Linarès,et al. Generalized driven decoding for speech recognition system combination , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6] Thomas Hofmann,et al. Topic-based language models using EM , 1999, EUROSPEECH.

[7] Michael Collins,et al. Trigger-Based Language Modeling using a Loss-Sensitive Perceptron Algorithm , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8] R. Rosenfeld,et al. Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[9] Hermann Ney,et al. On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[10] Guillaume Gravier,et al. The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[11] Renato De Mori,et al. A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[12] Bhiksha Raj,et al. A boosting approach for confidence scoring , 2001, INTERSPEECH.

[13] Ronald Rosenfeld,et al. A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[14] Mari Ostendorf,et al. Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[15] Peter Regel-Brietzmann,et al. Word graph rescoring using confidence measures , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16] Steve Renals,et al. Topic-based mixture language modelling , 1999, Nat. Lang. Eng..

[17] Ralf Schlüter,et al. Using word probabilities as confidence measures , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).