A comparative study of hybrid modelling techniques for improved telephone speech recognition

This paper presents a new technique for modelling heterogeneous data sources such as speech signals received via distinctly di erent channels which arises when an automatic speech recognition is deployed in wireless telephony in which highly heterogenous channels coexist and interoperate. The key problem is that a simple model may become inadequate to describe accurately the diversity of the signal, resulting in an unsatisfactory recognition performance. To cope up with this problem, di erent hybrid modelling techniques have been proposed and investigated in this paper by intelligently combining models from two di erent wireline and wireless environments.

[1]  Biing-Hwang Juang,et al.  A Minimum Error Rate Pattern Recognition Approach to Speech Recognition , 1994, Int. J. Pattern Recognit. Artif. Intell..

[2]  Rathinavelu Chengalvarayan On the use of normalized LPC error towards better large vocabulary speech recognition systems , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Chafic Mokbel,et al.  Solutions for robust recognition over the GSM cellular network , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  M.G. Rahim,et al.  Signal conditioning techniques for robust speech recognition , 1996, IEEE Signal Processing Letters.

[5]  Olli Viikki,et al.  A recursive feature vector normalization approach for robust speech recognition in noise , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Biing-Hwang Juang,et al.  Speech recognition in adverse environments , 1991 .

[7]  Régine André-Obrecht,et al.  Cellular phone speech recognition: noise compensation vs. robust architectures , 1997, EUROSPEECH.

[8]  Chafic Mokbel,et al.  Towards improving ASR robustness for PSN and GSM telephone applications , 1997, Speech Commun..

[9]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[10]  Biing-Hwang Juang,et al.  Generalized mixture of HMMs for continuous speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Wen Gao,et al.  Relative mel-frequency cepstral coefficients compensation for robust telephone speech recognition , 1997, EUROSPEECH.

[12]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[13]  Chin-Hui Lee,et al.  On stochastic feature and model compensation approaches to robust speech recognition , 1998, Speech Commun..