Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

[1]  Nuno Vasconcelos,et al.  Self-Supervised Generation of Spatial Audio for 360 Video , 2018, NIPS 2018.

[2]  Dingzeyu Li,et al.  Scene-aware audio for 360° videos , 2018, ACM Trans. Graph..

[3]  Chuang Gan,et al.  Self-Supervised Moving Vehicle Tracking With Stereo Sound , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[5]  William W. Gaver What in the World Do We Hear? An Ecological Approach to Auditory Event Perception , 1993 .

[6]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[7]  Andrew Owens,et al.  Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.

[8]  Benjamin Höferlin,et al.  Evaluation of background subtraction techniques for video surveillance , 2011, CVPR 2011.

[9]  John W. McDonough,et al.  Kalman Filters for Time Delay of Arrival-Based Source Localization , 2005, EURASIP J. Adv. Signal Process..

[10]  Ingmar Posner,et al.  Leveraging the urban soundscape: Auditory perception for smart vehicles , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Lawrence D. Rosenblum,et al.  Echolocating Distance by Moving and Stationary Listeners , 2000 .

[12]  Hirokazu Kameoka,et al.  Seeing through Sounds: Predicting Visual Semantic Segmentation Results from Multichannel Audio Signals , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Emanuel A. P. Habets,et al.  Inference of Room Geometry From Acoustic Impulse Responses , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Justin Salamon,et al.  A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.

[15]  H. Wallach,et al.  The role of head movements and vestibular and visual cues in sound localization. , 1940 .

[16]  Iván V. Meza,et al.  Localization of sound sources in robotics: A review , 2017, Robotics Auton. Syst..

[17]  Andrew Zisserman,et al.  Emotion Recognition in Speech using Cross-Modal Transfer in the Wild , 2018, ACM Multimedia.

[18]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[19]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Dinesh Manocha,et al.  3D Reconstruction in the presence of glasses by acoustic and stereo fusion , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Rogério Schmidt Feris,et al.  Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.

[22]  Marie-Francine Moens,et al.  Talk2Car: Taking Control of Your Self-Driving Car , 2019, EMNLP.

[23]  Roland Siegwart,et al.  The current state and future outlook of rescue robotics , 2019, J. Field Robotics.

[24]  H. Bülthoff,et al.  Merging the senses into a robust percept , 2004, Trends in Cognitive Sciences.

[25]  W R Thurlow,et al.  Head movements during sound localization. , 1967, The Journal of the Acoustical Society of America.

[26]  Luc Van Gool,et al.  Revisiting Multi-Task Learning in the Deep Learning Era , 2020, ArXiv.

[27]  Paul Hurley,et al.  DeepWave: A Recurrent Neural-Network for Real-Time Acoustic Imaging , 2019, NeurIPS.

[28]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[29]  Kristen Grauman,et al.  Co-Separating Sounds of Visual Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[31]  Ashutosh Saxena,et al.  Learning sound location from a single microphone , 2009, 2009 IEEE International Conference on Robotics and Automation.

[32]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[33]  Andrew Owens,et al.  Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.

[34]  Luc Van Gool,et al.  End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners , 2018, ECCV.

[35]  Andrew Zisserman,et al.  Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Russell L. Martin,et al.  Sound localization with head movement: implications for 3-d audio displays , 2014, Front. Neurosci..

[37]  Chenliang Xu,et al.  Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Adrian Hilton,et al.  3D Room Geometry Reconstruction Using Audio-Visual Sensors , 2017, 2017 International Conference on 3D Vision (3DV).

[40]  Martin Vetterli,et al.  Acoustic echoes reveal room shape , 2013, Proceedings of the National Academy of Sciences.

[41]  Andrew Owens,et al.  Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Robert Fendrich,et al.  The Merging of the Senses , 1993, Journal of Cognitive Neuroscience.

[43]  Federico Domínguez,et al.  SoundCompass: A Distributed MEMS Microphone Array-Based Sensor for Sound Source Localization , 2014, Sensors.

[44]  Alan L. Yuille,et al.  Towards unified depth and semantic prediction from a single image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Weidong Huang,et al.  Human Factors in Augmented Reality Environments , 2012, Springer New York.

[46]  Tae-Hyun Oh,et al.  Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Bruno Fazenda,et al.  Acoustic based safety emergency vehicle detection for intelligent transport systems , 2009, 2009 ICCAS-SICE.

[48]  Chuang Gan,et al.  The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Philippe Souères,et al.  A survey on sound source localization in robotics: From binaural to array processing methods , 2015, Comput. Speech Lang..

[52]  Alejandro Cartas,et al.  Seeing and Hearing Egocentric Actions: How Much Can We Learn? , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[53]  Luc Van Gool,et al.  Object Referring in Visual Scene with Spoken Language , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[54]  Andrew Zisserman,et al.  Objects that Sound , 2017, ECCV.

[55]  Yoav Y. Schechner,et al.  Harmony in Motion , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Chuang Gan,et al.  The Sound of Pixels , 2018, ECCV.