Scene-Aware Background Music Synthesis

In this paper, we introduce an interactive background music synthesis algorithm guided by visual content. We leverage a cascading strategy to synthesize background music in two stages: Scene Visual Analysis and Background Music Synthesis. First, seeking a deep learning-based solution, we leverage neural networks to analyze the sentiment of the input scene. Second, real-time background music is synthesized by optimizing a cost function that guides the selection and transition of music clips to maximize the emotion consistency between visual and auditory criteria, and music continuity. In our experiments, we demonstrate the proposed approach can synthesize dynamic background music for different types of scenarios. We also conducted quantitative and qualitative analysis on the synthesized results of multiple example scenes to validate the efficacy of our approach.

[1]  Judy Robertson,et al.  Real-time music generation for a virtual environment , 1998 .

[2]  P. Valdez,et al.  Effects of color on emotions. , 1994, Journal of experimental psychology. General.

[3]  Knut Hartmann,et al.  Composition and Arrangement Techniques for Music in Interactive Immersive Environments , 2006 .

[4]  Homer H. Chen,et al.  Music Emotion Recognition , 2011 .

[5]  Andrew Owens,et al.  Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.

[6]  Daniel P. W. Ellis,et al.  Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Xi Lu,et al.  "It sounds like she is sad": Introducing a Biosensing Prototype that Transforms Emotions into Real-time Music and Facilitates Social Interaction , 2019, CHI Extended Abstracts.

[8]  Maneesh Agrawala,et al.  Generating emotionally relevant musical scores for audio stories , 2014, UIST.

[9]  Jiebo Luo,et al.  Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.

[10]  Xavier Giró-i-Nieto,et al.  From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction , 2016, Image Vis. Comput..

[11]  Yi-Hsuan Yang,et al.  Ranking-Based Emotion Recognition for Music Organization and Retrieval , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Jean Garcia-Gathright,et al.  Just Give Me What I Want: How People Use and Evaluate Music Search , 2019, CHI.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Deborah Richards,et al.  VirSchool: The effect of background music and immersive display systems on memory for facts learned in an educational virtual environment , 2012, Comput. Educ..

[15]  W. Dowling Emotion and Meaning in Music , 2008 .

[16]  Dingzeyu Li,et al.  Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery , 2019, CHI.

[17]  Jiebo Luo,et al.  Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks , 2015, AAAI.

[18]  Mohan S. Kankanhalli,et al.  Emotional Attention: A Study of Image Sentiment and Visual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Richard F. Yalch,et al.  Effects of Store Music on Shopping Behavior , 1990 .

[20]  Chen Fang,et al.  Visual to Sound: Generating Natural Sound for Videos in the Wild , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Roger Levy,et al.  On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Quoc-Tuan Truong,et al.  Visual Sentiment Analysis for Review Images with Item-Oriented and User-Oriented CNN , 2017, ACM Multimedia.

[23]  Zhangyang Wang,et al.  MusicMapp: A Deep Learning Based Solution for Music Exploration and Visual Interaction , 2018, ACM Multimedia.

[24]  Hsin-Min Wang,et al.  Playing with tagging: A real-time tagging music player , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Jared S. Bauer,et al.  MoodMusic: a method for cooperative, generative music playlist creation , 2011, UIST '11 Adjunct.

[26]  Shachar Maidenbaum,et al.  EyeMusic: Introducing a "visual" colorful experience for the blind using auditory sensory substitution. , 2014, Restorative neurology and neuroscience.

[27]  P. Vuilleumier,et al.  How brains beware: neural mechanisms of emotional attention , 2005, Trends in Cognitive Sciences.

[28]  Gerald Myerson,et al.  Musical Scales and the Generalized Circle of Fifths , 1986 .

[29]  Wenguan Wang,et al.  Comic-guided speech synthesis , 2019, ACM Trans. Graph..

[30]  Peter J. Lang,et al.  A Bio‐Informational Theory of Emotional Imagery , 1979 .

[31]  Zhuowen Tu,et al.  Deeply Supervised Salient Object Detection with Short Connections , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Vincent P. Magnini,et al.  The psychological effects of music: Implications for hotel firms , 2009 .

[33]  Toby Gifford,et al.  In a Silent Way: Communication Between AI and Improvising Musicians Beyond Sound , 2019, CHI.

[34]  Wojciech Matusik,et al.  Learning to fly , 2019, ACM Trans. Graph..

[35]  Gautham J. Mysore,et al.  LoopMaker: Automatic Creation of Music Loops from Pre-recorded Music , 2018, CHI.

[36]  Ichiro Fujinaga,et al.  An Expert Ground Truth Set for Audio Chord Recognition and Music Analysis , 2011, ISMIR.

[37]  Deborah Richards,et al.  An investigation of the role of background music in IVWs for learning , 2008 .

[38]  W. Thompson,et al.  A Cross-Cultural Investigation of the Perception of Emotion in Music: Psychophysical and Cultural Cues , 1999 .

[39]  Wei Liang,et al.  A deep Coarse-to-Fine network for head pose estimation from synthetic data , 2019, Pattern Recognit..

[40]  T. Baumgartner,et al.  From emotion perception to emotion experience: emotions evoked by pictures and classical music. , 2006, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[41]  J. G. Fox,et al.  Background music and industrial efficiency-a review. , 1971, Applied ergonomics.

[42]  M. Bradley,et al.  Emotion, Motivation, and Anxiety: Brain Mechanisms and Psychophysiology the Motivational Organization of Emotion Patterns of Human Emotion Emotion and Perception the Psychophysiology of Picture Processing Neural Imaging: Motivation in the Visual Cortex Motivational Circuits in the Brain , 2022 .

[43]  Yiming Wu,et al.  Automatic Audio Chord Recognition With MIDI-Trained Deep Feature and BLSTM-CRF Sequence Decoding Model , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[44]  Dinesh Manocha,et al.  Dynamic Sound Field Synthesis for Speech and Music Optimization , 2018, ACM Multimedia.

[45]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Yoshiaki Mima,et al.  Music Composition with Recommendation , 2016, UIST.

[47]  Jung-Tai King,et al.  Measuring the Influences of Musical Parameters on Cognitive and Behavioral Responses to Audio Notifications Using EEG and Large-scale Online Studies , 2019, CHI.

[48]  Hsin-Min Wang,et al.  Automatic Music Video Generation Based on Simultaneous Soundtrack Recommendation and Video Editing , 2017, ACM Multimedia.

[49]  Yue Gao,et al.  Exploring Principles-of-Art Features For Image Emotion Recognition , 2014, ACM Multimedia.

[50]  Adrian C. North,et al.  Uses of Music in Everyday Life , 2004 .

[51]  Meinard Müller,et al.  Data-Driven Sound Track Generation , 2012, Multimodal Music Processing.

[52]  Sofia Cavaco,et al.  Color Sonification for the Visually Impaired , 2013 .