EFFECTIVE SINGING VOICE DETECTION IN POPULAR MUSIC USING ARMA FILTERING

Locating singing voice segments is essential for convenient indexing, browsing and retrieval large music archives and catalogues. Furthermore, it is beneficial for automatic music transcription and annotations. TheapproachdescribedinthispaperusesMel-Frequency Cepstral Coefficients in conjunction with Gaussian Mixture Models for discriminating two classes of data (instrumental music and singing voice with music background). Due to imperfect classificationbehavior, thecategorizationwithoutadditionalpost-processing tends to alternate within a very short time span, whereas singing voice tends to be continuous for several frames. Thus, various tests have been performed to identify a suitable decision function and corresponding smoothing methods. Results are reported by comparing the performance of straightforward likelihood based classifications vs. postprocessing with an autoregressive moving average filtering method.

[1]  Changsheng Xu,et al.  Singing voice detection using twice-iterated composite Fourier transform , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[2]  S. L. Marple,et al.  A tutorial overview of modern spectral estimation , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Daniel P. W. Ellis,et al.  USING VOICE SEGMENTS TO IMPROVE ARTIST CLASSIFICATION OF MUSIC , 2002 .

[4]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[5]  Daniel P. W. Ellis,et al.  Locating singing voice segments within music signals , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[6]  B. P. Bogert,et al.  The quefrency analysis of time series for echoes : cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking , 1963 .

[7]  Hsin-Min Wang,et al.  Towards Automatic Identification Of Singing Language In Popular Music Recordings , 2004, ISMIR.

[8]  George Tzanetakis,et al.  Song-specific bootstrapping of singing voice structure , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Keh-Shew Lu,et al.  DIGITAL FILTER DESIGN , 1973 .

[11]  Hsin-Min Wang,et al.  Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  W. M. Carey,et al.  Digital spectral analysis: with applications , 1986 .