论文信息 - Construction and evaluation of a robust multifeature speech/music discriminator

Construction and evaluation of a robust multifeature speech/music discriminator

We report on the construction of a real-time computer system capable of distinguishing speech signals from music signals over a wide range of digital audio input. We have examined 13 features intended to measure conceptually distinct properties of speech and/or music signals, and combined them in several multidimensional classification frameworks. We provide extensive data on system performance and the cross-validated training/test setup used to evaluate the system. For the datasets currently in use, the best classifier classifies with 5.8% error on a frame-by-frame basis, and 1.4% error when integrating long (2.4 second) segments of sound.

Malcolm Slaney | Eric D. Scheirer | M. Slaney | E. D. Scheirer

[1] Michael Hawley. Structure out of sound , 1993 .

[2] B. Kedem,et al. Spectral analysis and discrimination by zero-crossings , 1986, Proceedings of the IEEE.

[3] T. Houtgast,et al. The Modulation Transfer Function in Room Acoustics as a Predictor of Speech Intelligibility , 1973 .

[4] Stephen M. Omohundro,et al. Geometric learning algorithms , 1990 .

[5] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6] B. P. Bogert,et al. The quefrency analysis of time series for echoes : cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking , 1963 .

[7] Paul Mermelstein,et al. Experiments in syllable-based recognition of continuous speech , 1980, ICASSP.

[8] John Saunders,et al. Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9] T. Moon. The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..