Review of AMR speech codec-and distributed speech recognition-based speech-enabled services

In this paper, we investigate the usefulness of general-purpose speech codecs and dedicated speech recognition codecs for speech-enabled services. Specifically, we focus on 3rd generation WCDMA systems using the adaptive multi-rate (AMR) speech codec, in comparison with the distributed speech recognition (DSR) framework. Speech recognition experiments are carried out with the AMR speech codec in a simulated packet-switched network. The performance of the DSR codec is assumed to be unaffected by transmission errors. Experimental results in British English and Mandarin Chinese indicate that no significant performance difference can be observed between the AMRand DSR-based recognition systems. The gain from using the dedicated DSR codec is unlikely to provide a perceptible improvement in terms of quality of service for the end-users. In the light of the experimental results achieved, and other implementation and economical issues, it is concluded that the use of dedicated speech recognition codecs, such as DSR, does not offer tangible benefits in real-world systems and services.

[1]  Kari Jarvinen Standardisation of the adaptive multi-rate codec , 2000, 2000 10th European Signal Processing Conference.

[2]  Olli Viikki,et al.  A recursive feature vector normalization approach for robust speech recognition in noise , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Darren Pearce,et al.  Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[4]  Xia Wang,et al.  Low complexity Mandarin speaker-independent isolated word recognition , 2002, INTERSPEECH.

[5]  A. Lakaniemi,et al.  VoIP in 3G networks: an end-to-end quality of service analysis , 2003, The 57th IEEE Semiannual Vehicular Technology Conference, 2003. VTC 2003-Spring..

[6]  Imre Kiss A comparison of distributed and network speech recognition for mobile communication systems , 2000, INTERSPEECH.

[7]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..