An Integrated Scheme for Robust Distributed Speech Recognition Over Lossy Packet Networks

In this work we present a complete set of techniques devoted to offer robustness against frame losses in distributed speech recognition over packet-switched networks. The proposed scheme is composed of tree techniques, two of them are applied at the sender and the last one in the recognizer itself. On one hand, a media-specific forward error correction (FEC) technique is used to allow the recovery of information within the bursts. On the other hand, a recognizer-based technique well known by its remarkable ability to reduce the effects of long consecutive frame losses during recognition, the weighted Viterbi algorithm (WVA), is used to handle the additional information introduced by FEC codes. Moreover, a double stream strategy whereby interleaving can be applied along with FEC codes without any delay increase, is also applied. The application of interleaving allows to reduce the perceived burst length at the receiver, further improving the recognition performance. As a result, the proposed scheme can provide an acceptable performance even under extremely adverse channel conditions.

[1]  Wenyu Jiang,et al.  Modeling of Packet Loss and Delay and Their Effect on Real-Time Multimedia Service Quality , 2000 .

[2]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[3]  Abeer Alwan,et al.  Joint channel decoding - Viterbi recognition for wireless applications , 2001, INTERSPEECH.

[4]  Carmen García-Mateo,et al.  Soft decoding strategies for distributed speech recognition over IP networks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Ben P. Milner,et al.  Robust speech recognition over mobile and IP networks in burst-like packet loss , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Ángel M. Gómez,et al.  Packet loss concealment based on VQ replicas and MMSE estimation applied to distributed speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Ángel M. Gómez,et al.  Interleaving and MMSE estimation with VQ replicas for distributed speech recognition over lossy packet networks , 2006, INTERSPEECH.

[8]  Ángel M. Gómez,et al.  Combining Media-Specific FEC and Error Concealment for Robust Distributed Speech Recognition Over Loss-Prone Packet Channels , 2006, IEEE Transactions on Multimedia.

[9]  Ben P. Milner,et al.  An analysis of interleavers for robust speech recognition in burst-like packet loss , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.