Continuous speech recognition using attention shift decoding with soft decision

We present an attention shift decoding (ASD) method inspired by human speech recognition. In contrast to the traditional automatic speech recognition (ASR) systems, ASD decodes speech inconsecutively using reliability criteria; the gaps (unreliable speech regions) are decoded with the evidence of islands (reliable speech regions). On the BU Radio News Corpus, ASD provides significant improvement (2.9% absolute) over the baseline ASR results when it is used with oracle island-gap information. At the core of the ASD method is the automatic islandgap detection. Here, we propose a new feature set for automatic island-gap detection which achieves 83.7% accuracy. To cope with the imperfect nature of the island-gap classification, we also propose a new ASD algorithm using soft decision. The ASD with soft decision provides 0.4% absolute (2.2% relative) improvement over the baseline ASR results when it is used with automatically detected islands and gaps. Index Terms: speech recognition, decoding, attention, island.

[1]  Shrikanth S. Narayanan,et al.  Prominence Detection Using Auditory Attention Cues and Task-Dependent High Level Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Maria Caterina Silveri,et al.  Segregation of the Neural Correlates of Language and Phonological Short-Term Memory , 2003, Cortex.

[3]  Jeff A. Bilmes,et al.  Attention shift decoding for conversational speech recognition , 2007, INTERSPEECH.

[4]  James M. McQueen,et al.  Eight questions about spoken-word recognition , 2007 .

[5]  I. Pollack,et al.  Intelligibility of Excerpts from Conversation , 1963 .

[6]  Giorgio Satta,et al.  Stochastic Context-Free Grammars for Island-Driven Probabilistic Parsing , 1991, IWPT.

[7]  Victor R. Lesser,et al.  The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty , 1980, CSUR.

[8]  C Alain,et al.  Selectively attending to auditory objects. , 2000, Frontiers in bioscience : a journal and virtual library.

[9]  Benoît Maison,et al.  Toward island-of-reliability-driven very-large-vocabulary on-line handwriting recognition using character confidence scoring , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Jin H. Kim,et al.  On-line cursive script recognition using an island-driven search technique , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[11]  T. Kawabata,et al.  Island-driven continuous speech recognizer using phone-based HMM word spotting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..