Content-based mood classification for photos and music: a generic multi-modal classification framework and evaluation approach

Mood or emotion information are often used search terms or navigation properties within multimedia archives, retrieval systems or multimedia players. Most of these applications engage end-users or experts to tag multimedia objects with mood annotations. Within the scientific community different approaches for content-based music, photo or multimodal mood classification can be found with a wide range of used mood definitions or models and completely different test suites. The purpose of this paper is to review common mood models in order to assess their flexibility, to present a generic multi-modal mood classification framework which uses various audio-visual features and multiple classifiers and to present a novel music and photo mood classification reference set for evaluation. The classification framework is the basis for different applications e.g. automatic media tagging or music slideshow players. The novel reference set can be used for comparison of different algorithms from various research groups. Finally, the results of the introduced framework are presented, discussed and conclusions for future steps are drawn.

[1]  K. Hevner Experimental studies of the elements of expression in music , 1936 .

[2]  B. S. Manjunath,et al.  Introduction to mpeg-7 , 2002 .

[3]  Ansgar Feist,et al.  Entwicklung eines Verfahrens zur Erfassung des Gefühlszustandes (VGZ) , 2007 .

[4]  Zhang Naiyao,et al.  User-adaptive music emotion recognition , 2004, Proceedings 7th International Conference on Signal Processing, 2004. Proceedings. ICSP '04. 2004..

[5]  Hun-Woo Yoo,et al.  Visual-Based Emotional Descriptor and Feedback Mechanism for Image Retrieval , 2006, J. Inf. Sci. Eng..

[6]  Zhang Jian-chao,et al.  Image emotional classification: static vs. dynamic , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[7]  Hoirin Kim,et al.  A Music Summarization Scheme using Tempo Tracking and Two Stage Clustering , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[8]  A. Mehrabian Framework for a comprehensive description and measurement of emotional states. , 1995, Genetic, social, and general psychology monographs.

[9]  Tao Li,et al.  Content-based music similarity search and emotion detection , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[11]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[12]  Mert Bay,et al.  Creating a Simplified Music Mood Classification Ground-Truth Set , 2007, ISMIR.

[13]  Sung-Bae Cho,et al.  Emotional image and musical information retrieval with interactive genetic algorithm , 2004, Proc. IEEE.

[14]  Wei-Ning Wang,et al.  Image emotional semantic query based on color semantic description , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[15]  Dan Yang,et al.  Disambiguating Music Emotion Using Software Agents , 2004, ISMIR.

[16]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[17]  Bram van de Laar Emotion detection in music, a survey , 2006 .

[18]  Yimo Guo,et al.  Emotion Recognition System in Images Based On Fuzzy Neural Network and HMM , 2006, 2006 5th IEEE International Conference on Cognitive Informatics.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Tao Li,et al.  Detecting emotion in music , 2003, ISMIR.

[21]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[22]  T. Kemp,et al.  Mood-based navigation through large collections of musical data , 2005, Second IEEE Consumer Communications and Networking Conference, 2005. CCNC. 2005.

[23]  Yueting Zhuang,et al.  Music information retrieval by detecting mood via computational media aesthetics , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[24]  Christian Dittmar,et al.  Novel Mid-Level Audio Features for Music Similarity , 2007 .

[25]  M. Gruhne,et al.  Audio-visual fingerprinting and cross-modal aggregation: Components and applications , 2008, 2008 IEEE International Symposium on Consumer Electronics.

[26]  M. Akita,et al.  Objective evaluation of color design , 1987 .

[27]  Lie Lu,et al.  Automatic mood detection from acoustic music data , 2003, ISMIR.

[28]  Thijs Westerveld,et al.  Using generative probabilistic models for multimedia retrieval , 2005, SIGF.

[29]  Shyh-Kang Jeng,et al.  Emotion-Based Music Visualization Using Photos , 2008, MMM.

[30]  Shyh-Kang Jeng,et al.  Probabilistic Estimation of a Novel Music Emotion Model , 2008, MMM.

[31]  Alan Hanjalic,et al.  Extracting Moods from Pictures and Sounds , 2006 .

[32]  Yu Ying-lin,et al.  Image Retrieval by Emotional Semantics: A Study of Emotional Space and Feature Extraction , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[33]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[34]  R. Thayer The biopsychology of mood and arousal , 1989 .

[35]  Lie Lu,et al.  Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  James Monaco,et al.  Film verstehen : kunst, technik, sprache, geschichte und theorie des films , 1982 .

[37]  Yinglin Yu,et al.  Image emotional classification: static vs. dynamic , 2004, IEEE International Conference on Systems, Man and Cybernetics.

[38]  Yi-Hsuan Yang,et al.  Detecting and Classifying Emotion in Popular Music , 2006, JCIS.