Recognition of packet loss speech using the most reliable reduced-frame-rate data

In a client-server distributed speech recognition (DSR) application, speech features are extracted and quantized at the client-end, and are sent to a remote back-end server for recognition. Although the bandwidth constrains are mostly eliminated, data packets may be lost over error prone channels. In order to reduce the performance degradation because of frame missing, a frequently used error concealment approach is to restore a full frame rate (FFR) observation sequence for recognition at the back-end. In this paper, an alternative approach is proposed to deal with observations with lost frames. This approach at first extracts the most reliable reconstructed reduced-frame-rate (RFR) observation sequence from the received data at the back-end, and then decodes it with an adapted hidden Markov model (HMM) that compensates the mismatch between the FFR trained model and the RFR test data. Experimental results show that a DSR system using the proposed method can achieve the same level of accuracy as an FFR data reconstruction method and significantly lessens the computation time. From the viewpoint of user capacity of a DSR system, we find that the proposed method is capable of serving much more client users without any extra cost of installing new equipment.

[1]  Lee-Min Lee Adaptation of hidden Markov models for half frame rate observations , 2010 .

[2]  John H. L. Hansen,et al.  Missing-Feature Reconstruction by Leveraging Temporal Spectral Correlation for Robust Speech Recognition in Background Noise Conditions , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  Paul Dalsgaard,et al.  Automatic speech recognition over error-prone wireless networks , 2005, Speech Commun..

[5]  E. Gilbert Capacity of a burst-noise channel , 1960 .

[6]  Paul Dalsgaard,et al.  Exploiting Temporal Correlation of Speech for Error Robust and Bandwidth Flexible Distributed Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Abeer Alwan,et al.  Low-bitrate distributed speech recognition for packet-based and wireless communication , 2002, IEEE Trans. Speech Audio Process..

[8]  Fu-Rong Jean,et al.  Model adaptation method for recognition of speech with missing frames. , 2014, The Journal of the Acoustical Society of America.

[9]  Ben P. Milner,et al.  Robust speech recognition over IP networks , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[11]  Mari Ostendorf,et al.  Graceful degradation of speech recognition performance over packet-erasure networks , 2002, IEEE Trans. Speech Audio Process..

[12]  José L. Pérez-Córdoba,et al.  HMM-based channel error mitigation and its application to distributed speech recognition , 2003, Speech Commun..

[13]  Fu-Rong Jean,et al.  Adaptation of Hidden Markov Models for Recognizing Speech of Reduced Frame Rate , 2013, IEEE Transactions on Cybernetics.

[14]  Man-Hung Siu,et al.  A Robust Viterbi Algorithm Against Impulsive Noise With Application to Speech Recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.