Content-based music similarity search and emotion detection

The paper investigates the use of acoustic based features for music information retrieval. Two specific problems are studied: similarity search (searching for music sound files similar to a given music sound file) and emotion detection (detection of emotion in music sounds). The Daubechies wavelet coefficient histograms (Li, T. et al., SIGIR'03, p.282-9, 2003), which consist of moments of the coefficients calculated by applying the Db8 wavelet filter, are combined with the timbral features extracted using the MARSYAS system of G. Tzanctakis and P. Cook (see IEEE Trans. on Speech and Audio Process., vol.10, no.5, p.293-8, 2002) to generate compact music features. For the similarity search, the distance between two sound files is defined to be the Euclidean distance of their normalized representations. Based on the distance measure, the closest sound files to an input sound file are obtained. Experiments on jazz vocal and classical sound files achieve a very high level of accuracy. Emotion detection is cast as a multiclass classification problem, decomposed as a multiple binary classification problem, and is resolved with the use of support vector machines trained on the extracted features. Our experiments on emotion detection achieved reasonably accurate performance and provided some insights on future work.

[1]  Jonathan Foote,et al.  Automatic Music Summarization via Similarity Analysis , 2002, ISMIR.

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Jonathan Foote,et al.  Audio Retrieval by Rhythmic Similarity , 2002, ISMIR.

[4]  U. Nam Addressing the Same but different-different but similar problem in automatic music classification , 2001 .

[5]  Alex Waibel,et al.  Detecting Emotions in Speech , 1998 .

[6]  Mark B. Sandler,et al.  Classification of audio signals using statistical features on time and wavelet transform domains , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Beth Logan,et al.  A Content-Based Music Similarity Function , 2001 .

[8]  K. Hevner Experimental studies of the elements of expression in music , 1936 .

[9]  David Huron Perceptual and Cognitive Applications in Music Information Retrieval , 2000, ISMIR.

[10]  Tao Li,et al.  Detecting emotion in music , 2003, ISMIR.

[11]  Valery A. Petrushin,et al.  EMOTION IN SPEECH: RECOGNITION AND APPLICATION TO CALL CENTERS , 1999 .

[12]  Man-Kwan Shan,et al.  A personalized music filtering system based on melody style classification , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  Barry Vercoe,et al.  Folk Music Classification Using Hidden Markov Models , 2001 .

[14]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[15]  Adrian C. North,et al.  The social psychology of music. , 1997 .

[16]  David S. Watson,et al.  A Machine Learning Approach to Musical Style Recognition , 1997, ICMC.

[17]  Sethuraman Panchanathan,et al.  Wavelet-histogram method for face recognition , 2000, J. Electronic Imaging.

[18]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[19]  Elias Pampalk,et al.  Using PsychoAcoustic Models and SOMs to create a Hierarchical Structuring of Music Using PsychoAcoustic Models and Self-Organizing Maps to Create a Hierarchical Structuring of Music by Sound Similarity , 2002 .

[20]  C. Krumhansl Music as Cognition. , 1987 .

[21]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[22]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[23]  Yoichi Muraoka,et al.  A beat tracking system for acoustic signals of music , 1994, MULTIMEDIA '94.

[24]  Shingo Uchihashi,et al.  The beat spectrum: a new approach to rhythm analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[25]  Hrishikesh Deshpande,et al.  CLASSIFICATION OF MUSIC SIGNALS IN THE VISUAL DOMAIN , 2001 .

[26]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[27]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[28]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .