Recognizing GSM digital speech

The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech recognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source coding distortion and transmission errors. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bitstream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant advantages. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion as a result of the encoding-decoding process. Second, when transmission errors occur, our front-end becomes more effective since it is not affected by errors in bits allocated to the excitation signal. We have considered the half and the full-rate standard codecs and compared the proposed front-end with the conventional approach in two ASR tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated channel conditions. Furthermore, the disparity increases as the network conditions worsen.

[1]  Stephen A. McGuire,et al.  Introductory Statistics , 2007, Technometrics.

[2]  Etsi Tc-Smg,et al.  European digital cellular telecommunications system (Phase 2); Radio transmission and reception (GSM 05.05) , 1994 .

[3]  古井 貞煕,et al.  Digital speech processing, synthesis, and recognition , 1989 .

[4]  Jeroen Wigard,et al.  A simple mapping from C/I to FER and BER for a GSM type of air-interface , 1996, Proceedings of PIMRC '96 - 7th International Symposium on Personal, Indoor, and Mobile Communications.

[5]  Karl Hellwig,et al.  A regular-pulse excited linear predictive codec , 1988, Speech Commun..

[6]  Vassilios Digalakis,et al.  Quantization of cepstral parameters for speech recognition over the World Wide Web , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Chin-Hui Lee,et al.  On stochastic feature and model compensation approaches to robust speech recognition , 1998, Speech Commun..

[8]  G. D'Aria,et al.  Simulation and performance of the pan-European land mobile radio system , 1992 .

[9]  Jean-Pierre Adoul,et al.  Enhanced full rate speech codec for IS-136 digital cellular system , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Hwang Soo Lee,et al.  Speech recognition using quantized LSP parameters and their transformations in digital communication , 2000, Speech Commun..

[11]  Simão Ferraz de Campos Neto,et al.  Performance assessment of tandem connection of enhanced cellular coders , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  Hong Kook Kim,et al.  A bitstream-based front-end for wireless speech recognition on IS-136 communications system , 2001, IEEE Trans. Speech Audio Process..

[13]  Francisco J. Valverde-Albacete,et al.  Avoiding distortions due to speech coding and transmission errors in GSM ASR tasks , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[14]  Ben Y. Zhao,et al.  A Markov-Based Channel Model Algorithm for Wireless Networks , 2001, MSWIM '01.

[15]  Roger C. F. Tucker,et al.  Compression of acoustic features - are perceptual quality and recognition performance incompatible goals? , 1999, EUROSPEECH.

[16]  Jean-Claude Junqua Robust Speech Recognition in Embedded Systems and PC Applications , 2000 .

[17]  Philip Lockwood,et al.  Evaluation of root-normalised front-end (RN LFCC) for speech recognition in wireless GSM network environments , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[18]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[19]  Francisco J. Valverde-Albacete,et al.  Recognition from GSM digital speech , 1998, ICSLP.

[20]  A.R.K. Sastry,et al.  Models for channels with memory and their applications to error control , 1978, Proceedings of the IEEE.

[21]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[22]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition , 1996 .

[23]  Ponani S. Gopalakrishnan,et al.  Compression of acoustic features for speech recognition in network environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[24]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[25]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[26]  Spiros Dimolitsas,et al.  Voice quality of interconnected PCS, Japanese cellular, and public switched telephone networks , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[27]  Fernando Pérez Fontán,et al.  Ber performance assessment of the land mobile GSM channel with application to automatic speech recognition task , 1999 .

[28]  Biing-Hwang Juang,et al.  Filtering the time sequences of spectral parameters for speech recognition, , 1997, Speech Commun..

[29]  Chafic Mokbel,et al.  Solutions for robust recognition over the GSM cellular network , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[30]  John Cotton,et al.  Introductory statistics. 3rd ed. , 1978 .

[31]  Guido Bertocci,et al.  Report: The 32-kb/s ADPCM coding standard , 1986, AT&T Technical Journal.

[32]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[33]  Carmen Peláez-Moreno,et al.  Recognizing voice over IP: a robust front-end for speech recognition on the world wide web , 2001, IEEE Trans. Multim..

[34]  S. J. Campanella DIGITAL SPEECH PROCESSING METHODS , 1972 .

[35]  Fei Xie,et al.  Speech enhancement by spectral magnitude estimation - A unifying approach , 1996, Speech Commun..

[36]  Ira Alan Gerson,et al.  Vector Sum Excited Linear Prediction (VSELP) , 1991 .

[37]  Chafic Mokbel,et al.  Towards improving ASR robustness for PSN and GSM telephone applications , 1997, Speech Commun..

[38]  Hwang Soo Lee,et al.  On approximating line spectral frequencies to LPC cepstral coefficients , 2000, IEEE Trans. Speech Audio Process..

[39]  Mark J. F. Gales Predictive model-based compensation schemes for robust speech recognition , 1998, Speech Commun..

[40]  Lou Boves,et al.  Acoustic features and a distance measure that reduce the impact of training-test mismatch in ASR , 2001, Speech Commun..

[41]  S.M. Elnoubi Analysis of GMSK with differential detection in land mobile radio channels , 1986, IEEE Transactions on Vehicular Technology.

[42]  M. W. Oliphant,et al.  An introduction to GSM , 1995 .

[43]  Vassilios Digalakis,et al.  Robust speech recognition for multiple topological scenarios of the GSM mobile phone system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[44]  Stephan Euler,et al.  The influence of speech coding algorithms on automatic speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[45]  Kuldip K. Paliwal,et al.  Effect of speech coders on speech recognition performance , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.