论文信息 - Employing heterogeneous information in a multi-stream framework

Employing heterogeneous information in a multi-stream framework

A multi-stream speech recogniser is based on the combination of multiple feature streams each containing complementary information. In the past, multi-stream research has typically focused on systems that use a single feature extraction method. This heritage from conventional speech recognisers is an unnecessary restriction and both psychoacoustic and phonetic knowledge strongly motivate the use of heterogeneous features. In this paper we investigate how heterogeneous processing can be used in two different multi-stream configurations: first, a system where each stream handles a different frequency region of the speech (a multi-band recogniser) and, second a multi-stream recogniser where each stream handles the full frequency region. For each type of system we compare the performance using both homogeneous and heterogeneous processing. We demonstrate that the use of heterogeneous information significantly improves the clean speech recognition performance motivating us to continue exploring more specifically designed stream processing.

Heidi Christensen | Ove Andersen | Børge Lindberg

[1] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[2] James R. Glass,et al. Heterogeneous measurements and multiple classifiers for speech recognition , 1998, ICSLP.

[3] Hervé Bourlard,et al. Non-Stationary Multi-Channel (Multi-Stream) Processing Towards Robust and Adaptive ASR , 1999 .

[4] Hervé Bourlard,et al. Using multiple time scales in a multi-stream speech recognition system , 1997, EUROSPEECH.

[5] H.J.M. Steeneken,et al. On measuring and predicting speech intelligibility , 1992 .

[6] Jean-François Mari,et al. A recombination model for multi-band speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[8] Steven Greenberg,et al. Incorporating information from syllable-length time scales into automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9] Daniel P. W. Ellis,et al. Multi-stream speech recognition: ready for prime time? , 1999, EUROSPEECH.

[10] Hervé Bourlard,et al. Multi-Stream Speech Recognition , 1996 .

[11] Jiri Matas,et al. On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12] Oded Ghitza. Auditory models and human performance in tasks related to speech coding and speech recognition , 1994 .

[13] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.