2.5D Visual Sound
暂无分享,去创建一个
[1] Stephen Gould,et al. Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[2] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[3] Kazuhiro Iida,et al. Personalization of head-related transfer functions in the median plane based on the anthropometry of the listener's pinnae. , 2014, The Journal of the Acoustical Society of America.
[4] Shmuel Peleg,et al. Visual Speech Enhancement , 2017, INTERSPEECH.
[5] Tuomas Virtanen,et al. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[6] Chen Fang,et al. Visual to Sound: Generating Natural Sound for Videos in the Wild , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[8] Alessio Del Bue,et al. Seeing the Sound: A New Multimodal Imaging Device for Computer Vision , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).
[9] Rémi Gribonval,et al. Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[10] Antonio Torralba,et al. See, Hear, and Read: Deep Aligned Representations , 2017, ArXiv.
[11] Robert Höldrich,et al. A 3D Ambisonic Based Binaural Sound Reproduction System , 2003 .
[12] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[13] Daniel P. W. Ellis,et al. MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Paris Smaragdis,et al. AUDIO/VISUAL INDEPENDENT COMPONENTS , 2003 .
[16] DeLiang Wang,et al. Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[17] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[18] Chenliang Xu,et al. Deep Cross-Modal Audio-Visual Generation , 2017, ACM Multimedia.
[19] James R. Glass,et al. Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input , 2018, ECCV.
[20] Yoav Y. Schechner,et al. Harmony in Motion , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[21] Nuno Vasconcelos,et al. Self-Supervised Generation of Spatial Audio for 360 Video , 2018, NIPS 2018.
[22] Maja Pantic,et al. Audio-visual object localization and separation using low-rank and sparsity , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Paris Smaragdis,et al. Deep learning for monaural speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[25] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[26] Scott Rickard,et al. Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.
[27] Kristen Grauman,et al. Im2Flow: Motion Hallucination from Static Images for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[28] John William Strutt. Scientific Papers: Our Perception of the Direction of a Source of Sound , 2009 .
[29] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[30] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[31] Bhiksha Raj,et al. Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.
[32] Our Perception of the Direction of a Source of Sound , 1876, Nature.
[33] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Lorenzo Torresani,et al. Co-Training of Audio and Video Representations from Self-Supervised Temporal Synchronization , 2018, ArXiv.
[35] Patrick Pérez,et al. Motion informed audio source separation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Hiroaki Kitano,et al. Real-time speaker localization and speech separation by audio-visual integration , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[37] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Trevor Darrell,et al. Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.
[39] Daniel P. W. Ellis,et al. Source separation based on binaural cues and source model constraints , 2008, INTERSPEECH.
[40] Edgar A. Torres-Gallegos,et al. Personalization of head-related transfer functions (HRTF) based on automatic photo-anthropometry and inference from a database , 2015 .
[41] DeLiang Wang,et al. Deep Learning Based Binaural Speech Separation in Reverberant Environments , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[42] Jiajun Wu,et al. Generative Modeling of Audible Shapes for Object Perception , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[43] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[44] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[45] Volker Gnann. SOURCE-FILTER BASED CLUSTERING FOR MONAURAL BLIND SOURCE SEPARATION , 2009 .
[46] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[47] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[48] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[49] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[50] Dingzeyu Li,et al. Scene-aware audio for 360° videos , 2018, ACM Trans. Graph..
[51] W. Koenig,et al. Subjective Effects in Binaural Hearing , 1950 .
[52] Dingzeyu Li,et al. Scene-Aware Audio for 360\textdegree{} Videos , 2018, ArXiv.
[53] Gaurav Sharma,et al. See and listen: Score-informed association of sound tracks to players in chamber music performance videos , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Christian Jutten,et al. Two multimodal approaches for single microphone source separation , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).
[55] Radu Horaud,et al. The cocktail party robot: Sound source separation and localisation with an active binaural head , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[56] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[57] Michael Elad,et al. Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).