The effects of background music on speech recognition accuracy

Recognition of broadcast data, such as TV and radio programs is a topic of great interest. One of the problems with such data is the frequent presence of background music that degrades the performance of speech recognition systems. In this paper we examine the effects of different kinds of music on automatic speech recognition systems by comparing the effects of music with the relatively well-known effects of white noise on these systems. We also examine the extent to which compensation algorithms that have been successfully applied to noisy speech are also helpful in improving recognition accuracy for speech that is corrupted by music. It is hoped that these experimental comparisons will lead to a better understanding of how to compensate for the effects of background music.

[1]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Evandro B. Gouvêa,et al.  Multivariate-Gaussian-based cepstral normalization for robust speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Misha Pavel,et al.  Towards ASR on partially corrupted speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  E. Tronci,et al.  1996 , 1997, Affair of the Heart.

[5]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[6]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  Richard M. Stern,et al.  RECOGNITION OF CONTINUOUS BROADCAST NEWS WITH MULTIPLE UNKNOWN SPEAKERS AND ENVIRONMENTS , 1995 .