Emotional scene understanding based on acoustic signals using adaptive neuro-fuzzy inference system

We propose a novel approach to recognize positive or negative emotions from acoustic signals in movies by extracting musical components such as tempo, loudness and melody and then by applying ANFIS Model with fuzzy clustering. In order to extract emotional features in acoustic signals, we first transform the sound into a spectrogram. The spectrogram visually represents characteristic information of sound such as tempo, loudness and melody. Then, we apply the fuzzy model on spectrogram to get the effective emotion features of sound. The extracted tempo, loudness and melody information is used as inputs for an adaptive neuro-fuzzy inference system (ANFIS) with fuzzy c-means clustering (FCM). Finally, the ANFIS classifies the sound as positive or negative emotion, which is compared with a mean opinion score of human in test movies.