The literature on content-based music retrieval has largely finessed acoustic issues by using MIDI format music. This paper however considers content-based classification and retrieval of a typical (MPEG layer III) digital music archive. Two statistical techniques are investigated and appraised. Gaussian mixture modelling performs well with an accuracy of 92% on a music classification task. A tree-based vector quantization scheme offers marginally worse performance in a faster, scalable framework. Good results are also reported for music retrieval-by-similarity using the same techniques. Mel-frequency cepstral coefficients parameterize the audio well, though are slow to compute from the compressed domain. A new parameterization (MP3CEP), based on a partial decompression of MPEG layer III audio, is therefore proposed to facilitate music processing at user-interactive speeds. Overall, the techniques described provide useful tools in the management of a typical digital music library.
[1]
Pavel Zezula,et al.
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
,
1997,
VLDB.
[2]
Stan Davis,et al.
Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se
,
1980
.
[3]
Douglas A. Reynolds,et al.
Speaker identification and verification using Gaussian mixture speaker models
,
1995,
Speech Commun..
[4]
Biing-Hwang Juang,et al.
Fundamentals of speech recognition
,
1993,
Prentice Hall signal processing series.
[5]
Davis Pan,et al.
A Tutorial on MPEG/Audio Compression
,
1995,
IEEE Multim..
[6]
Gerard Salton,et al.
A vector space model for automatic indexing
,
1975,
CACM.
[7]
Jonathan Foote,et al.
Content-based retrieval of music and audio
,
1997,
Other Conferences.