Conditional Generation of Audio from Video via Foley Analogies
暂无分享,去创建一个
[1] Yi Ren,et al. VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement , 2022, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Andrew Zisserman,et al. Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors , 2022, BMVC.
[3] Bryan C. Russell,et al. It's Time for Artistic Correspondence in Music and Video , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Andrew Owens,et al. Learning Visual Styles from Audio-Visual Associations , 2022, ECCV.
[5] K. Grauman,et al. Visual Acoustic Matching , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Wonmin Byeon,et al. Sound-Guided Semantic Image Manipulation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Sanchita Ghose,et al. FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos , 2021, IEEE Transactions on Multimedia.
[8] Andrew Owens,et al. Structure from Silence: Learning Scene Structure from Ambient Sound , 2021, CoRL.
[9] Esa Rahtu,et al. Taming Visually Guided Sound Generation , 2021, BMVC.
[10] Yu Wang,et al. Who Calls The Shots? Rethinking Few-Shot Learning for Audio , 2021, 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
[11] Kristen Grauman,et al. Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video , 2021, BMVC.
[12] Bo Dai,et al. Visually Informed Binaural Audio Generation without Binaural Audios , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Ling Shao,et al. Repetitive Activity Counting by Sight and Sound , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[15] B. Ommer,et al. Taming Transformers for High-Resolution Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Efthymios Tzinis,et al. Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds , 2020, ICLR.
[17] Sanchita Ghose,et al. AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos With Deep Learning , 2020, IEEE Transactions on Multimedia.
[18] Kun Su,et al. Multi-Instrumentalist Net: Unsupervised Generation of Music from Body Movements , 2020, ArXiv.
[19] Tim Sainburg,et al. Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires , 2020, PLoS Comput. Biol..
[20] G. Richard,et al. DrumGAN: Synthesis of Drum Sounds With Timbral Feature Conditioning Using Generative Adversarial Networks , 2020, ISMIR.
[21] Chuang Gan,et al. Foley Music: Learning to Generate Music from Videos , 2020, ECCV.
[22] Chenliang Xu,et al. Talking-head Generation with Rhythmic Head Motion , 2020, ECCV.
[23] Justin Salamon,et al. Telling Left From Right: Learning Spatial Correspondence of Sight and Sound , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] C. V. Jawahar,et al. Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Andrew Zisserman,et al. Sight to Sound: An End-to-End Approach for Visual Piano Transcription , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Yu Wang,et al. Few-Shot Sound Event Detection , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Andrew Zisserman,et al. Vggsound: A Large-Scale Audio-Visual Dataset , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[29] Joon Son Chung,et al. Who said that?: Audio-visual speaker diarisation of real-world meetings , 2019, INTERSPEECH.
[30] Cem Anil,et al. TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer , 2018, ICLR.
[31] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[32] Maneesh Agrawala,et al. Visual rhythm and beat , 2018, ACM Trans. Graph..
[33] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[34] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[35] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[36] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[37] Julius O. Smith,et al. Neural Style Transfer for Audio Spectograms , 2018, ArXiv.
[38] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[39] Chen Fang,et al. Visual to Sound: Generating Natural Sound for Videos in the Wild , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[40] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[41] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[42] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[43] Alexei A. Efros,et al. Real-time user-guided image colorization with learned deep priors , 2017, ACM Trans. Graph..
[44] Shmuel Peleg,et al. Vid2speech: Speech reconstruction from silent video , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[47] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[48] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.
[50] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Leon A. Gatys,et al. A Neural Algorithm of Artistic Style , 2015, ArXiv.
[52] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[53] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[54] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[55] Daniel P. W. Ellis,et al. MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.
[56] Dennis Van Vliet,et al. How does that sound , 2011 .
[57] V. Ament. The Foley Grail: The Art of Performing Sound for Film, Games, and Animation , 2009 .
[58] David Salesin,et al. Image Analogies , 2001, SIGGRAPH.
[59] Dinesh K. Pai,et al. FoleyAutomatic: physically-based sound effects for interactive simulation and animation , 2001, SIGGRAPH.
[60] John R. Hershey,et al. Audio-Visual Sound Separation Via Hidden Markov Models , 2001, NIPS.
[61] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[62] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.