Robust speech recognition for multiple topological scenarios of the GSM mobile phone system

This paper deals with robust speech recognition in the GSM mobile environment. Our focus is on the voice degradation due to the losses in the GSM coding scheme. Thus, we initially propose an experimental framework of network topologies that consists of various coding-decoding systems placed in tandem. After measuring the recognition performance for each of these network scenarios, we try to increase the recognition accuracy by using feature compensation and model adaptation algorithms. We first compare the different methods for all the network topologies assuming the topology is known. We then investigate the more realistic case, in which we do not know the network topology the voice has passed through. The results show that robustness can be achieved even in this case.

[1]  Kuldip K. Paliwal,et al.  Effect of Speech Coders on Speech Recognition Performance , 1996, Fourth International Symposium on Signal Processing and Its Applications.

[2]  Mitch Weintraub,et al.  Microphone-Independent Robust Signal Processing Using Probabilistic Optimum Filtering , 1994, HLT.

[3]  Vassilios Digalakis,et al.  A comparative study of speaker adaptation techniques , 1995, EUROSPEECH.

[4]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[5]  Mitch Weintraub,et al.  Large-vocabulary dictation using SRI's DECIPHER speech recognition system: progressive search techniques , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Leonardo Neumeyer,et al.  Probabilistic optimum filtering for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  C. Ravishankar,et al.  Voice quality of interconnected North-American cellular, European cellular, and public switched telephone networks , 1995, 1995 IEEE 45th Vehicular Technology Conference. Countdown to the Wireless Twenty-First Century.

[8]  Simão Ferraz de Campos Neto,et al.  Performance assessment of tandem connection of cellular and satellite-mobile coders , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Vassilios Digalakis,et al.  Speaker adaptation using combined transformation and Bayesian methods , 1996, IEEE Trans. Speech Audio Process..

[10]  Vassilios Digalakis,et al.  Genones: optimizing the degree of mixture tying in a large vocabulary hidden Markov model based speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..