Audio Generation from Scene Considering Its Emotion Aspect

Scenes can convey emotion like music. If that's so, it might be possible that, given an image, one can generate music with similar emotional reaction from users. The challenge lies in how to do that. In this paper, we use the Hue, Saturation and Lightness features from a number of image samples extracted from videos excerpts and the tempo, loudness and rhythm from a number of audio samples also extracted from the same video excerpts to train a group of neural networks, including Recurrent Neural Network and Neuro-Fuzzy Network, and obtain the desired audio signal to evoke a similar emotional response to a listener. This work could prove to be an important contribution to the field of Human-Computer Interaction because it can improve the interaction between computers and humans. Experimental results show that this model effectively produces an audio that matches the video evoking a similar emotion from the viewer.