论文信息 - Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds - 字舞流文

Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

Luc Van Gool | Dengxin Dai | Arun Balajee Vasudevan | L. Gool | Dengxin Dai | A. Vasudevan

[1] Nuno Vasconcelos,et al. Self-Supervised Generation of Spatial Audio for 360 Video , 2018, NIPS 2018.

[2] Dingzeyu Li,et al. Scene-aware audio for 360° videos , 2018, ACM Trans. Graph..

[3] Chuang Gan,et al. Self-Supervised Moving Vehicle Tracking With Stereo Sound , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4] Antonio Torralba,et al. LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[5] William W. Gaver. What in the World Do We Hear? An Ecological Approach to Auditory Event Perception , 1993 .

[6] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[7] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.

[8] Benjamin Höferlin,et al. Evaluation of background subtraction techniques for video surveillance , 2011, CVPR 2011.

[9] John W. McDonough,et al. Kalman Filters for Time Delay of Arrival-Based Source Localization , 2005, EURASIP J. Adv. Signal Process..

[10] Ingmar Posner,et al. Leveraging the urban soundscape: Auditory perception for smart vehicles , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[11] Lawrence D. Rosenblum,et al. Echolocating Distance by Moving and Stationary Listeners , 2000 .

[12] Hirokazu Kameoka,et al. Seeing through Sounds: Predicting Visual Semantic Segmentation Results from Multichannel Audio Signals , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Emanuel A. P. Habets,et al. Inference of Room Geometry From Acoustic Impulse Responses , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[14] Justin Salamon,et al. A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.

[15] H. Wallach,et al. The role of head movements and vestibular and visual cues in sound localization. , 1940 .

[16] Iván V. Meza,et al. Localization of sound sources in robotics: A review , 2017, Robotics Auton. Syst..

[17] Andrew Zisserman,et al. Emotion Recognition in Speech using Cross-Modal Transfer in the Wild , 2018, ACM Multimedia.

[18] Guy J. Brown,et al. Computational auditory scene analysis , 1994, Comput. Speech Lang..

[19] Gabriel J. Brostow,et al. Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20] Dinesh Manocha,et al. 3D Reconstruction in the presence of glasses by acoustic and stereo fusion , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.

[22] Marie-Francine Moens,et al. Talk2Car: Taking Control of Your Self-Driving Car , 2019, EMNLP.

[23] Roland Siegwart,et al. The current state and future outlook of rescue robotics , 2019, J. Field Robotics.

[24] H. Bülthoff,et al. Merging the senses into a robust percept , 2004, Trends in Cognitive Sciences.

[25] W R Thurlow,et al. Head movements during sound localization. , 1967, The Journal of the Acoustical Society of America.

[26] Luc Van Gool,et al. Revisiting Multi-Task Learning in the Deep Learning Era , 2020, ArXiv.

[27] Paul Hurley,et al. DeepWave: A Recurrent Neural-Network for Real-Time Acoustic Imaging , 2019, NeurIPS.

[28] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[29] Kristen Grauman,et al. Co-Separating Sounds of Visual Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[31] Ashutosh Saxena,et al. Learning sound location from a single microphone , 2009, 2009 IEEE International Conference on Robotics and Automation.

[32] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[33] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.

[34] Luc Van Gool,et al. End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners , 2018, ECCV.

[35] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36] Russell L. Martin,et al. Sound localization with head movement: implications for 3-d audio displays , 2014, Front. Neurosci..

[37] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.

[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39] Adrian Hilton,et al. 3D Room Geometry Reconstruction Using Audio-Visual Sensors , 2017, 2017 International Conference on 3D Vision (3DV).

[40] Martin Vetterli,et al. Acoustic echoes reveal room shape , 2013, Proceedings of the National Academy of Sciences.

[41] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Robert Fendrich,et al. The Merging of the Senses , 1993, Journal of Cognitive Neuroscience.

[43] Federico Domínguez,et al. SoundCompass: A Distributed MEMS Microphone Array-Based Sensor for Sound Source Localization , 2014, Sensors.

[44] Alan L. Yuille,et al. Towards unified depth and semantic prediction from a single image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Weidong Huang,et al. Human Factors in Augmented Reality Environments , 2012, Springer New York.

[46] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47] Bruno Fazenda,et al. Acoustic based safety emergency vehicle detection for intelligent transport systems , 2009, 2009 ICCAS-SICE.

[48] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51] Philippe Souères,et al. A survey on sound source localization in robotics: From binaural to array processing methods , 2015, Comput. Speech Lang..

[52] Alejandro Cartas,et al. Seeing and Hearing Egocentric Actions: How Much Can We Learn? , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[53] Luc Van Gool,et al. Object Referring in Visual Scene with Spoken Language , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[54] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.

[55] Yoav Y. Schechner,et al. Harmony in Motion , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[56] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.