Foley Music: Learning to Generate Music from Videos
暂无分享,去创建一个
Chuang Gan | Peihao Chen | Joshua B. Tenenbaum | Antonio Torralba | Deng Huang | J. Tenenbaum | A. Torralba | Chuang Gan | Peihao Chen | Deng Huang
[1] Xiao Liu,et al. Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[2] Douglas Eck,et al. Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.
[3] Andrew Zisserman,et al. Sight to Sound: An End-to-End Approach for Visual Piano Transcription , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5] Xiaogang Wang,et al. Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation , 2020, ECCV.
[6] Chuang Gan,et al. Self-Supervised Moving Vehicle Tracking With Stereo Sound , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[7] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[8] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[9] Isabella Poggi,et al. Gestures in performance , 2009 .
[10] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[11] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[12] Colin Raffel,et al. A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music , 2018, ICML.
[13] Chen Fang,et al. Visual to Sound: Generating Natural Sound for Videos in the Wild , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[15] Andrew Zisserman,et al. Emotion Recognition in Speech using Cross-Modal Transfer in the Wild , 2018, ACM Multimedia.
[16] Gaëtan Hadjeres,et al. Deep Learning Techniques for Music Generation - A Survey , 2017, ArXiv.
[17] Hongjia Zhang,et al. Scene recognition under special traffic conditions based on deep multi‐task learning , 2020, The Journal of Engineering.
[18] Chuang Gan,et al. ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation , 2020, ArXiv.
[19] Dahua Lin,et al. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.
[20] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..
[21] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[22] Dong Chen,et al. Cross-Task Transfer for Multimodal Aerial Scene Recognition , 2020, ArXiv.
[23] Jitendra Malik,et al. Learning Individual Styles of Conversational Gesture , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Tae-Hyun Oh,et al. Listen to Look: Action Recognition by Previewing Audio , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Frank Nielsen,et al. DeepBach: a Steerable Model for Bach Chorales Generation , 2016, ICML.
[26] Xiao Liu,et al. Multimodal Keyless Attention Fusion for Video Classification , 2018, AAAI.
[27] Yi-Hsuan Yang,et al. MidiNet: A Convolutional Generative Adversarial Network for Symbolic-Domain Music Generation , 2017, ISMIR.
[28] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[29] Mubarak Shah,et al. Multimodal Analysis for Identification and Segmentation of Moving-Sounding Objects , 2013, IEEE Transactions on Multimedia.
[30] Sanja Fidler,et al. Song From PI: A Musically Plausible Network for Pop Music Generation , 2016, ICLR.
[31] Chenliang Xu,et al. Deep Cross-Modal Audio-Visual Generation , 2017, ACM Multimedia.
[32] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[33] Douglas Eck,et al. This time with feeling: learning expressive musical performance , 2018, Neural Computing and Applications.
[34] Chuang Gan,et al. Look, Listen, and Act: Towards Audio-Visual Embodied Navigation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).
[35] Kumar Krishna Agrawal,et al. GANSynth: Adversarial Neural Audio Synthesis , 2019, ICLR.
[36] Ira Kemelmacher-Shlizerman,et al. Audio to Body Dynamics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[37] Kun Su,et al. Audeo: Audio Generation for a Silent Performance Video , 2020, NeurIPS.
[38] Gaurav Sharma,et al. Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications , 2016, IEEE Transactions on Multimedia.
[39] Kristen Grauman,et al. 2.5D Visual Sound , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[41] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..
[42] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[43] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Lorenzo Torresani,et al. Co-Training of Audio and Video Representations from Self-Co-Training of Audio and Video Representations from Self-Supervised Temporal Synchronization Supervised Temporal Synchronization , 2018 .
[45] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[46] Yaser Sheikh,et al. Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] M. Leman,et al. Musical gestures : sound, movement, and meaning , 2010 .
[48] Chuang Gan,et al. Music Gesture for Visual Sound Separation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[50] Chuang Gan,et al. Self-supervised Audio-visual Co-segmentation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[51] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[52] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[53] Paulo Carvalho,et al. A fuzzy data reduction cluster method based on boundary information for large datasets , 2019, Neural Computing and Applications.
[54] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[57] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[58] Xiaogang Wang,et al. Vision-Infused Deep Audio Inpainting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[59] Dahua Lin,et al. Recursive Visual Sound Separation Using Minus-Plus Net , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[60] Chuang Gan,et al. Generating Visually Aligned Sound From Videos , 2020, IEEE Transactions on Image Processing.
[61] Ming-Hsuan Yang,et al. Structural Constraint Data Association for Online Multi-object Tracking , 2018, International Journal of Computer Vision.
[62] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[63] B. Holden. Listen and learn , 2002 .
[64] Kun Zhao,et al. An Emotional Symbolic Music Generation System based on LSTM Networks , 2019, 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC).
[65] Andrew Zisserman,et al. Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[66] Antonio Torralba,et al. Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Yoav Goldberg,et al. At Your Fingertips: Automatic Piano Fingering Detection , 2019 .
[68] Phillip Isola,et al. Contrastive Multiview Coding , 2019, ECCV.
[69] Nuno Vasconcelos,et al. Self-Supervised Generation of Spatial Audio for 360 Video , 2018, NIPS 2018.
[70] Ramakant Nevatia,et al. Visually Indicated Sound Generation by Perceptually Optimized Classification , 2018, ECCV Workshops.
[71] Andrew M. Dai,et al. Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.
[72] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[73] Joon Son Chung,et al. You Said That?: Synthesising Talking Faces from Audio , 2019, International Journal of Computer Vision.