Recognition of coded speech transmitted over wireless channels

Network-based speech recognition (NSR) and distributed speech recognition (DSR) have been proposed as solutions to translate speech recognition technologies to mobile environments. NSR is the most straightforward solution since it does not require any modification in the mobile phone, however DSR offers higher robustness against codec compression and transmission channel degradation. This paper explores an alternative approach for remote speech recognition which combines the advantages of NSR and DSR. In this scheme, a standard speech codec is used for speech transmission but the recognition is performed from the received codec parameters. In particular, we focus on the effect of transmission channel errors, which can cause a more severe performance reduction on speech recognition than codec distortion. First, we show that an NSR solution can approach DSR through a reconstruction technique along with an adapted noise reduction technique originally proposed for acoustic noise. Then, these results are improved by working with recognition features directly extracted from the codec bitstream by means of parameter transcoding. Required modifications on current networks in order to access the bitstream are described. The network upgrading with the tandem free operation (TFO) protocol is an attractive solution. This upgrade not only offers an overall improvement on the end-to-end speech quality, but would also allow a recognition performance similar, and even higher in poor channel conditions, to that obtained by DSR when parameter transcoding along with the proposed mitigation techniques are applied

[1]  Hong Kook Kim,et al.  A bitstream-based front-end for wireless speech recognition on IS-136 communications system , 2001, IEEE Trans. Speech Audio Process..

[2]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[3]  Richard M. Stern,et al.  Speech recognition from GSM codec parameters , 1998, ICSLP.

[4]  José L. Pérez-Córdoba,et al.  Efficient MMSE-based channel error mitigation techniques. Application to distributed speech recognition over wireless channels , 2005, IEEE Transactions on Wireless Communications.

[5]  Ángel M. Gómez,et al.  Mitigation of channel errors in EFR-based speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Benoît Maison,et al.  A robust high accuracy speech recognition system for mobile applications , 2002, IEEE Trans. Speech Audio Process..

[7]  Christophe Beaugeant,et al.  Network-based vs. distributed speech recognition in adaptive multi-rate wireless systems , 2002, INTERSPEECH.

[8]  Darren Pearce,et al.  Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[9]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[10]  David Pearce,et al.  Speech recognition performance comparison between DSR and AMR transcoded speech , 2002, INTERSPEECH.

[11]  Seung Ho Choi,et al.  Speech recognition method using quantised LSP parameters in CELP-type coders , 1998 .

[12]  José L. Pérez-Córdoba,et al.  HMM-based channel error mitigation and its application to distributed speech recognition , 2003, Speech Commun..

[13]  Carmen Peláez-Moreno,et al.  A robust front-end for ASR over IP snd GSM networks: an integrated scenario , 2001, INTERSPEECH.

[14]  Alexis Bernard,et al.  Can back-ends be more robust than front-ends? Investigation over the Aurora-2 database , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Bhiksha Raj,et al.  Distributed speech recognition with codec parameters , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..