暂无分享,去创建一个
[1] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] Luc Van Gool,et al. Object Referring in Visual Scene with Spoken Language , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).
[3] Ashutosh Saxena,et al. Learning sound location from a single microphone , 2009, 2009 IEEE International Conference on Robotics and Automation.
[4] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[5] Andrew Zisserman,et al. Emotion Recognition in Speech using Cross-Modal Transfer in the Wild , 2018, ACM Multimedia.
[6] William Whittaker,et al. Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.
[7] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[8] Gabriel J. Brostow,et al. Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[10] Lawrence D. Rosenblum,et al. Echolocating Distance by Moving and Stationary Listeners , 2000 .
[11] Dingzeyu Li,et al. Scene-aware audio for 360° videos , 2018, ACM Trans. Graph..
[12] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[13] H. Wallach,et al. The role of head movements and vestibular and visual cues in sound localization. , 1940 .
[14] Russell L. Martin,et al. Sound localization with head movement: implications for 3-d audio displays , 2014, Front. Neurosci..
[15] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[16] Marie-Francine Moens,et al. Talk2Car: Taking Control of Your Self-Driving Car , 2019, EMNLP.
[17] Luc Van Gool,et al. Revisiting Multi-Task Learning in the Deep Learning Era , 2020, ArXiv.
[18] H. Bülthoff,et al. Merging the senses into a robust percept , 2004, Trends in Cognitive Sciences.
[19] Sidney S. Simon,et al. Merging of the Senses , 2008, Front. Neurosci..
[20] Federico Domínguez,et al. SoundCompass: A Distributed MEMS Microphone Array-Based Sensor for Sound Source Localization , 2014, Sensors.
[21] Benjamin Höferlin,et al. Evaluation of background subtraction techniques for video surveillance , 2011, CVPR 2011.
[22] Emanuel A. P. Habets,et al. Inference of Room Geometry From Acoustic Impulse Responses , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[23] Weidong Huang,et al. Human Factors in Augmented Reality Environments , 2012, Springer New York.
[24] Alan L. Yuille,et al. Towards unified depth and semantic prediction from a single image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Dinesh Manocha,et al. 3D Reconstruction in the presence of glasses by acoustic and stereo fusion , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Hirokazu Kameoka,et al. Seeing through Sounds: Predicting Visual Semantic Segmentation Results from Multichannel Audio Signals , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[28] Roland Siegwart,et al. The current state and future outlook of rescue robotics , 2019, J. Field Robotics.
[29] Guy J. Brown,et al. Computational auditory scene analysis , 1994, Comput. Speech Lang..
[30] Jana Kosecka,et al. Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).
[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[32] Alejandro Cartas,et al. Seeing and Hearing Egocentric Actions: How Much Can We Learn? , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[33] Yoav Y. Schechner,et al. Harmony in Motion , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[34] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Bruno Fazenda,et al. Acoustic based safety emergency vehicle detection for intelligent transport systems , 2009, 2009 ICCAS-SICE.
[36] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[37] John W. McDonough,et al. Kalman Filters for Time Delay of Arrival-Based Source Localization , 2005, EURASIP J. Adv. Signal Process..
[38] William W. Gaver. What in the World Do We Hear? An Ecological Approach to Auditory Event Perception , 1993 .
[39] Chuang Gan,et al. Self-Supervised Moving Vehicle Tracking With Stereo Sound , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[40] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..
[41] Martin Vetterli,et al. Acoustic echoes reveal room shape , 2013, Proceedings of the National Academy of Sciences.
[42] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[43] Adrian Hilton,et al. 3D Room Geometry Reconstruction Using Audio-Visual Sensors , 2017, 2017 International Conference on 3D Vision (3DV).
[44] Philippe Souères,et al. A survey on sound source localization in robotics: From binaural to array processing methods , 2015, Comput. Speech Lang..
[45] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Paul Hurley,et al. DeepWave: A Recurrent Neural-Network for Real-Time Acoustic Imaging , 2019, NeurIPS.
[47] Justin Salamon,et al. A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.
[48] Nuno Vasconcelos,et al. Self-Supervised Generation of Spatial Audio for 360 Video , 2018, NIPS 2018.
[49] Antonio Torralba,et al. LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.
[50] Luc Van Gool,et al. End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners , 2018, ECCV.
[51] Ingmar Posner,et al. Leveraging the urban soundscape: Auditory perception for smart vehicles , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[52] Iván V. Meza,et al. Localization of sound sources in robotics: A review , 2017, Robotics Auton. Syst..
[53] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[54] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.
[55] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[56] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[57] William Whittaker,et al. Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.
[58] Kristen Grauman,et al. Co-Separating Sounds of Visual Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).