A Graphical Model Based Decoder for Recognition of Loss-concealed VoIP Speech

In the recognition of Voice Over Internet Protocol(VoIP) speech, packet losses pose a challenge that is generally addressed by packet loss concealment (PLC)techniques. But improper concealment by these PLC techniques result in unreliable observations that contribute adversely to the Viterbi decoding step and result in misrecognitions. We propose a graphical model based decoding architecture that can skip unreliable observations and hence the corresponding states, so that recognition accuracy improves in spite of improper concealment. Experimental validation of the proposed skip decoder is carried out using the loss-concealed speech samples of isolated words. Results indicate the efficacy of the skip decoder in the improvement of the recognition accuracy for different lengths of burst losses and different PLC schemes.

[1]  Geoffrey Zweig,et al.  The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Shivakumar Vaithyanathan,et al.  Asynchronous HMM with applications to speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Colin Perkins,et al.  A survey of packet loss recovery techniques for streaming audio , 1998 .

[4]  Abeer Alwan,et al.  Source and channel coding for remote speech recognition over error-prone channels , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[6]  Jeff A. Bilmes,et al.  Dynamic Bayesian Multinets , 2000, UAI.

[7]  J.A. Bilmes,et al.  Graphical model architectures for speech recognition , 2005, IEEE Signal Processing Magazine.

[8]  Geoffrey Zweig,et al.  Bayesian network structures and inference techniques for automatic speech recognition , 2003, Comput. Speech Lang..

[9]  V. Hardman,et al.  A survey of packet loss recovery techniques for streaming audio , 1998, IEEE Network.

[10]  Jeff A. Bilmes,et al.  Graphical models and automatic speech recognition , 2002 .