Binaural audio generation via multi-task learning
暂无分享,去创建一个
[1] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[2] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Andrew Zisserman,et al. Learnable PINs: Cross-Modal Embeddings for Person Identity , 2018, ECCV.
[5] Nuno Vasconcelos,et al. Self-Supervised Generation of Spatial Audio for 360 Video , 2018, NIPS 2018.
[6] Chuang Gan,et al. Music Gesture for Visual Sound Separation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Ben P. Milner,et al. Generating Intelligible Audio Speech From Visual Speech , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[8] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[9] Shinji Watanabe,et al. Weakly-Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).
[10] Malcolm Slaney,et al. Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers , 2017, ArXiv.
[11] Simon Haykin,et al. The Cocktail Party Problem , 2005, Neural Computation.
[12] Nikunj Raghuvanshi,et al. Aerophones in flatland , 2015, ACM Trans. Graph..
[13] Andrzej Cichocki,et al. Nonnegative Matrix and Tensor Factorization T , 2007 .
[14] Ravish Mehra,et al. Efficient HRTF-based Spatial Audio for Area and Volumetric Sources , 2016, IEEE Transactions on Visualization and Computer Graphics.
[15] Ravish Mehra,et al. Efficient construction of the spatial room impulse response , 2017, 2017 IEEE Virtual Reality (VR).
[16] Dingzeyu Li,et al. Scene-aware audio for 360° videos , 2018, ACM Trans. Graph..
[17] Dinesh Manocha,et al. Diffraction Kernels for Interactive Sound Propagation in Dynamic Environments , 2018, IEEE Transactions on Visualization and Computer Graphics.
[18] Dinesh Manocha,et al. Sound Synthesis, Propagation, and Rendering: A Survey , 2020, ArXiv.
[19] Rui Wang,et al. Deep Audio-visual Learning: A Survey , 2020, International Journal of Automation and Computing.
[20] Timothy R. Langlois,et al. Scene-Aware Audio Rendering via Deep Acoustic Analysis , 2019, IEEE Transactions on Visualization and Computer Graphics.
[21] Justin Salamon,et al. Telling Left From Right: Learning Spatial Correspondence of Sight and Sound , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Patrick Pérez,et al. Identify, Locate and Separate: Audio-Visual Object Extraction in Large Video Collections Using Weak Supervision , 2018, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
[23] Qiang Yang,et al. An Overview of Multi-task Learning , 2018 .
[24] A. Krokstad,et al. Calculating the acoustical room response by the use of a ray tracing technique , 1968 .
[25] Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.
[26] Hung-Yu Tseng,et al. Self-Supervised Audio Spatialization with Correspondence Classifier , 2019, 2019 IEEE International Conference on Image Processing (ICIP).
[27] Tuomas Virtanen,et al. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[28] Josh H. McDermott. The cocktail party problem , 2009, Current Biology.
[29] Zhaoxiang Zhang,et al. CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation , 2017, AAAI.
[30] Dinesh Manocha,et al. Wave-ray coupling for interactive sound propagation in large complex scenes , 2013, ACM Trans. Graph..
[31] Kristen Grauman,et al. Co-Separating Sounds of Visual Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[32] Luciano Fadiga,et al. Face Landmark-based Speaker-independent Audio-visual Speech Enhancement in Multi-talker Environments , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[34] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[35] Dinesh Manocha,et al. Wave-based sound propagation in large open scenes using an equivalent source formulation , 2013, TOGS.
[36] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[37] Xuelong Li,et al. Deep Co-Clustering for Unsupervised Audiovisual Learning , 2018, ArXiv.
[38] Bo Dai,et al. Visually Informed Binaural Audio Generation without Binaural Audios , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Andrew J. Davison,et al. End-To-End Multi-Task Learning With Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Brian Kingsbury,et al. New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[41] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[42] M A Lord Rayleigh,et al. On Our Perception of the Direotion of a Source of Sound , 1875 .
[43] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[44] Emilia Gómez,et al. Monoaural Audio Source Separation Using Deep Convolutional Neural Networks , 2017, LVA/ICA.
[45] Chen Fang,et al. Visual to Sound: Generating Natural Sound for Videos in the Wild , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[46] Luc Van Gool,et al. Multi-Task Learning for Dense Prediction Tasks: A Survey. , 2020, IEEE transactions on pattern analysis and machine intelligence.
[47] Alex Hofmann,et al. Points2Sound: from mono to binaural audio using 3D point cloud scenes , 2021, EURASIP Journal on Audio, Speech, and Music Processing.
[48] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[49] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[50] Kristen Grauman,et al. 2.5D Visual Sound , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[51] DeLiang Wang,et al. Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[52] Dinesh Manocha,et al. Interactive sound propagation with bidirectional path tracing , 2016, ACM Trans. Graph..
[53] Adrian Hilton,et al. Immersive Spatial Audio Reproduction for VR/AR Using Room Acoustic Modelling from 360° Images , 2019, 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR).
[54] F. Wightman,et al. The dominant role of low-frequency interaural time differences in sound localization. , 1992, The Journal of the Acoustical Society of America.
[55] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[56] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[57] Xiaogang Wang,et al. Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation , 2020, ECCV.
[58] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Andrew Zisserman,et al. X2Face: A network for controlling face generation by using images, audio, and pose codes , 2018, ECCV.