Affective perception of Musical Television

In this work, a typical hierarchical Support Vector Machine (SVM) classifier structure with three stages is adopted to identify 6 affective modes of happy, angry, excited, nervous, sad and calm from Musical TeleVision (MTV) sequences, which comprise audio and video signals. To comprehend emotional modes, audio features including the spectral centroid, spectral spread, zero crossing rate, peak of zero crossing rates, duration, tempo and variance of FFT coefficients, and visual features including color temperature and standard deviations of motion vectors are used. They are extracted and jointly employed to increase the recognition accuracy according to their physical characteristics on emotions. Particularly, adequate features are addressed and investigated in each classification stage. The experimental results demonstrate that the proposed affective recognition scheme can achieve a fair recognition rate of 73.3%. As compared to the results from the one-stage scheme using audio features only, the proposed scheme can greatly enhance the recognition accuracy.

[1]  Yi-Hsuan Yang,et al.  A Regression Approach to Music Emotion Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Sergios Theodoridis,et al.  Music tracking in audio streams from movies , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[3]  Gregory Dudek,et al.  Bimodal information analysis for emotion recognition , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[4]  Athanasia Zlatintsi,et al.  A supervised approach to movie emotion tracking , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Yi-Hsuan Yang,et al.  Automatic highlights extraction for drama video using music emotion and human face features , 2011, 2011 IEEE 13th International Workshop on Multimedia Signal Processing.

[6]  Kah Phooi Seng,et al.  Audio-Emotion Recognition System Using Parallel Classifiers and Audio Feature Analyzer , 2011, 2011 Third International Conference on Computational Intelligence, Modelling & Simulation.

[7]  Günes Karabulut-Kurt,et al.  A novel perceptual feature set for audio emotion recognition , 2011, Face and Gesture 2011.

[8]  Shiliang Zhang,et al.  Affective MTV analysis based on arousal and valence features , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[9]  L. C. De Silva,et al.  Bimodal emotion recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).