A robust scheme for distributed speech recognition over loss-prone packet channels

In this paper, we propose a whole recovery scheme designed to improve robustness against packet losses in distributed speech recognition systems. This scheme integrates two sender-driven techniques, namely, media-specific forward error correction (FEC) and frame interleaving, along with a receiver-based error concealment (EC) technique, the weighted Viterbi algorithm (WVA). Although these techniques have been already tested separately, providing a significant increase of performance in clean acoustic environments, in this paper they are jointly applied and their performance in adverse acoustic conditions is evaluated. In particular, a noisy speech database and the ETSI Advanced Front-end are used, while the dynamic features, which play an important role in adverse acoustic environments, and their confidences for the WVA algorithm are examined. In order to solve the issue of mixing two sender-driven techniques (both causing a delay) whose direct composition causes an increase of the global latency, we propose a double stream scheme which limits the latency to the maximum delay of both techniques. As a result, with very few overhead bits and a very limited delay, the integrated scheme achieves a significant improvement in the performance of a DSR system over a degraded transmission channel, both in clean and noisy acoustic conditions.

[1]  Chris Heegard,et al.  A Theory of Interleavers , 1997 .

[2]  Ángel M. Gómez,et al.  A source model mitigation technique for distributed speech recognition over lossy packet channels , 2003, INTERSPEECH.

[3]  Satoshi Nakamura,et al.  Missing Feature Theory Applied to Robust Speech Recognition over IP Network , 2003, IEICE Trans. Inf. Syst..

[4]  Antonio M. Peinado Speech Recognition Over Digital Channels: Robustness and Standards , 2006 .

[5]  Ben P. Milner,et al.  Robust speech recognition over mobile and IP networks in burst-like packet loss , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Angel Manuel Gomez,et al.  A comparison of packet loss compensation methods and interleaving for speech recognition in burst-like packet loss , 2004, INTERSPEECH.

[7]  Antonio Rubio,et al.  Statistical-based reconstruction methods for speech recognition in IP networks , 2004 .

[8]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9]  Ángel M. Gómez,et al.  Combining Media-Specific FEC and Error Concealment for Robust Distributed Speech Recognition Over Loss-Prone Packet Channels , 2006, IEEE Transactions on Multimedia.

[10]  Jon Postel,et al.  User Datagram Protocol , 1980, RFC.

[11]  Ben P. Milner,et al.  Analysis and compensation of packet loss in distributed speech recognition using interleaving , 2003, INTERSPEECH.

[12]  V. Hardman,et al.  A survey of packet loss recovery techniques for streaming audio , 1998, IEEE Network.

[13]  John L. Ramsey Realization of optimum interleavers , 1970, IEEE Trans. Inf. Theory.

[14]  Henning Schulzrinne,et al.  RTP: A Transport Protocol for Real-Time Applications , 1996, RFC.

[15]  Ángel M. Gómez,et al.  An integrated solution for error concealment in DSR systems over wireless channels , 2006, INTERSPEECH.

[16]  Ben Milner,et al.  Packet Loss Modelling for Distributed Speech Recognition , 2004 .

[17]  Ben P. Milner,et al.  Robust speech recognition over IP networks , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Ben P. Milner,et al.  Soft decoding of temporal derivatives for robust distributed speech recognition in packet loss , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[19]  Ben P. Milner,et al.  An analysis of interleavers for robust speech recognition in burst-like packet loss , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Ángel M. Gómez,et al.  On the Ramsey Class of Interleavers for Robust Speech Recognition in Burst-Like Packet Loss , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Rajeev Koodli,et al.  One-way Loss Pattern Sample Metrics , 2002, RFC.

[22]  Carmen García-Mateo,et al.  Soft decoding strategies for distributed speech recognition over IP networks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Alexandros Potamianos,et al.  Soft-feature decoding for speech recognition over wireless channels , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[24]  Jean-Chrysostome Bolot,et al.  End-to-end packet delay and loss behavior in the internet , 1993, SIGCOMM '93.

[25]  Mervyn A. Jack,et al.  Weighted Viterbi algorithm and state duration modelling for speech recognition in noise , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[26]  Reinhold Häb-Umbach,et al.  Uncertainty decoding for distributed speech recognition over error-prone networks , 2006, Speech Commun..

[27]  Carmen García-Mateo,et al.  Weighted Viterbi decoding strategies for distributed speech recognition over IP networks , 2006, Speech Commun..

[28]  Ángel M. Gómez,et al.  Packet loss concealment based on VQ replicas and MMSE estimation applied to distributed speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[29]  Ángel M. Gómez,et al.  An Integrated Scheme for Robust Distributed Speech Recognition Over Lossy Packet Networks , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[30]  Abeer Alwan,et al.  Low-bitrate distributed speech recognition for packet-based and wireless communication , 2002, IEEE Trans. Speech Audio Process..