Turbo Processing for Speech Recognition

Speech recognition is a classic example of a human/machine interface, typifying many of the difficulties and opportunities of human/machine interaction. In this paper, speech recognition is used as an example of applying turbo processing principles to the general problem of human/machine interface. Speech recognizers frequently involve a model representing phonemic information at a local level, followed by a language model representing information at a nonlocal level. This structure is analogous to the local (e.g., equalizer) and nonlocal (e.g., error correction decoding) elements common in digital communications. Drawing from the analogy of turbo processing for digital communications, turbo speech processing iteratively feeds back the output of the language model to be used as prior probabilities for the phonemic model. This analogy is developed here, and the performance of this turbo model is characterized by using an artificial language model. Using turbo processing, the relative error rate improves significantly, especially in high-noise settings.

[1]  Hong Sun,et al.  Turbo Iterative Signal Processing , 2009, 2009 IEEE 13th Digital Signal Processing Workshop and 5th IEEE Signal Processing Education Workshop.

[2]  Mohan M. Trivedi,et al.  Multimodal information fusion using the iterative decoding algorithm and its application to audio-visual speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[4]  Mari Ostendorf,et al.  Lattice-based search strategies for large vocabulary speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Jun Rim Choi,et al.  A modified two-step SOVA-based turbo decoder with a fixed scaling factor , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[6]  Mark J. F. Gales,et al.  The Application of Hidden Markov Models in Speech Recognition , 2007, Found. Trends Signal Process..

[7]  John Yearwood,et al.  A Constraint-Based Evolutionary Learning Approach to the Expectation Maximization for Optimal Estimation of the Hidden Markov Model for Speech Signal Modeling , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[9]  F. Jelinek Fast sequential decoding algorithm using a stack , 1969 .

[10]  Roger S. Cheng,et al.  Improvements in SOVA-based decoding for turbo codes , 1997, Proceedings of ICC'97 - International Conference on Communications.

[11]  B. L. Yeap,et al.  Turbo Coding, Turbo Equalisation and Space-Time Coding , 2002 .

[12]  Todd K. Moon,et al.  Multiple-Access via Turbo Joint Equalization , 2012, IEEE Transactions on Communications.

[13]  James R. Glass,et al.  A turbo-style algorithm for lexical baseforms estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Henry S. Thompson,et al.  Best-first enumeration of paths through a lattice—an active chart parsing solution , 1990 .

[15]  Douglas B. Paul,et al.  Algorithms for an Optimal A* Search and Linearizing the Search in the Stack Decoder* , 1991, HLT.

[16]  Giulio Colavolpe,et al.  Extrinsic information in iterative decoding: a unified view , 2001, IEEE Trans. Commun..

[17]  Alain Glavieux,et al.  Reflections on the Prize Paper : "Near optimum error-correcting coding and decoding: turbo codes" , 1998 .

[18]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[19]  R. Koetter,et al.  Turbo equalization , 2004, IEEE Signal Processing Magazine.

[20]  Barry G. Evans,et al.  Modification of branch metric calculation to improve iterative SOVA decoding of turbo codes , 2003 .

[21]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[22]  A. Glavieux,et al.  Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1 , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[23]  Andrew C. Singer,et al.  Turbo equalization: principles and new results , 2002, IEEE Trans. Commun..

[24]  John Cocke,et al.  Optimal decoding of linear codes for minimizing symbol error rate (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[25]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[26]  Steve Young,et al.  The use of syntax and multiple alternatives in the VODIS voice operated database inquiry system , 1991 .

[27]  Cheol Hoon Park,et al.  Hybrid Simulated Annealing and Its Application to Optimization of Hidden Markov Models for Visual Speech Recognition , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).