暂无分享,去创建一个
Andrew Owens | Ziyang Chen | Xixi Hu | Andrew Owens | Ziyang Chen | Xixi Hu
[1] Arkadiusz Stopczynski,et al. Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] D. A. Riley,et al. Evidence for echolocation in the rat. , 1955, Science.
[3] Virginia R. de Sa,et al. Learning Classification with Unlabeled Data , 1993, NIPS.
[4] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[5] Weifeng Chen,et al. Single-Image Depth Perception in the Wild , 2016, NIPS.
[6] Kristen Grauman,et al. Semantic Audio-Visual Navigation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Abhinav Gupta,et al. Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.
[8] Sebastian Thrun,et al. Affine Structure From Sound , 2005, NIPS.
[9] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[10] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[13] Weifeng Chen,et al. Learning Single-Image Depth From Videos Using Quality Assessment Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Kristen Grauman,et al. VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency , 2021, Computer Vision and Pattern Recognition.
[15] James R. Glass,et al. Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input , 2018, ECCV.
[16] Peter Gerstoft,et al. Extracting time‐domain Green's function estimates from ambient seismic noise , 2005 .
[17] Andrew Owens,et al. Self-Supervised Learning of Audio-Visual Objects from Video , 2020, ECCV.
[18] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[19] Yong Jae Lee,et al. Audiovisual SlowFast Networks for Video Recognition , 2020, ArXiv.
[20] Jitendra Malik,et al. Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[21] Iván V. Meza,et al. Localization of sound sources in robotics: A review , 2017, Robotics Auton. Syst..
[22] Justin Salamon,et al. Telling Left From Right: Learning Spatial Correspondence of Sight and Sound , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] Weiyao Lin,et al. Multiple Sound Sources Localization from Coarse to Fine , 2020, ECCV.
[26] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[27] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] L. Verhoeven,et al. Can one Hear the Shape of a Drum? , 2015 .
[29] Richard I. Hartley,et al. In Defense of the Eight-Point Algorithm , 1997, IEEE Trans. Pattern Anal. Mach. Intell..
[30] Josh H McDermott,et al. Statistics of natural reverberation enable perceptual separation of sound and space , 2016, Proceedings of the National Academy of Sciences.
[31] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.
[32] Xavier Serra,et al. Freesound technical demo , 2013, ACM Multimedia.
[33] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[34] Jan-Michael Frahm,et al. 3D model matching with Viewpoint-Invariant Patches (VIP) , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[35] Andrew Zisserman,et al. Vggsound: A Large-Scale Audio-Visual Dataset , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Lorenzo Torresani,et al. Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization , 2018, NeurIPS.
[37] Lawrence D. Rosenblum,et al. Hearing Silent Shapes: Identifying the Shape of a Sound-Obstructing Surface , 2007 .
[38] Bernhard P. Wrobel,et al. Multiple View Geometry in Computer Vision , 2001 .
[39] Joon Son Chung,et al. Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Alexei A. Efros,et al. Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.
[41] Jitendra Malik,et al. Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[42] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[43] Andrew L. Yarrow. Look , 2019, Portrait.
[44] Hod Lipson,et al. The Boombox: Visual Reconstruction from Acoustic Vibrations , 2021, CoRL.
[45] Jitendra Malik,et al. Mesh R-CNN , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[46] Rob Fergus,et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[47] Chuang Gan,et al. Look, Listen, and Act: Towards Audio-Visual Embodied Navigation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).
[48] David F. Fouhey,et al. Associative3D: Volumetric Reconstruction from Sparse Views , 2020, ECCV.
[49] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[50] Jessika Weiss,et al. Vision Science Photons To Phenomenology , 2016 .
[51] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[52] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[53] Andrea Vedaldi,et al. Labelling unlabelled videos from scratch with multi-modal self-supervision , 2020, NeurIPS.
[54] Noah Snavely,et al. Extreme Rotation Estimation using Dense Correlation Volumes , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Zhengqi Li,et al. MegaDepth: Learning Single-View Depth Prediction from Internet Photos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[56] Sascha Hornauer,et al. BatVision: Learning to See 3D Spatial Layout with Two Ears , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).
[57] Peter Gerstoft,et al. Ocean bottom profiling with ambient noise: a model for the passive fathometer. , 2011, The Journal of the Acoustical Society of America.
[58] Xavier Serra,et al. Freesound Datasets: A Platform for the Creation of Open Audio Datasets , 2017, ISMIR.
[59] Nuno Vasconcelos,et al. Self-Supervised Generation of Spatial Audio for 360 Video , 2018, NIPS 2018.
[60] Chunhua Shen,et al. Learning to Recover 3D Scene Shape from a Single Image , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[61] P. van Kranenburg,et al. International Society for Music Information Retrieval , 2014 .
[62] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[63] Nuno Vasconcelos,et al. Audio-Visual Instance Discrimination with Cross-Modal Agreement , 2020, ArXiv.
[64] Daniel H. Ashmead,et al. Obstacle perception by ongenitally blind children , 1989 .
[65] Abhinav Gupta,et al. Learning to fly by crashing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[66] Kristen Grauman,et al. Learning to Set Waypoints for Audio-Visual Navigation. , 2020 .
[67] Weiyao Lin,et al. Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching , 2020, NeurIPS.
[68] Dinesh Manocha,et al. Reflection-Aware Sound Source Localization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[69] D H Ashmead,et al. Auditory perception of walls via spectral variations in the ambient sound field. , 1999, Journal of rehabilitation research and development.
[70] Gabriel J. Brostow,et al. Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[71] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[72] Robert S. Wall,et al. Low Frequency Sound as a Navigational Tool for People with Visual Impairments , 2002 .
[73] Chuang Gan,et al. Self-Supervised Moving Vehicle Tracking With Stereo Sound , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[74] K. Grauman,et al. VisualEchoes: Spatial Image Representation Learning through Echolocation , 2020, ECCV.
[75] Kristen Grauman,et al. SoundSpaces: Audio-Visual Navigation in 3D Environments , 2020, ECCV.
[76] David Guth,et al. Echolocation Reconsidered: Using Spatial Variations in the Ambient Sound Field to Guide Locomotion , 1998 .
[77] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[78] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[79] Kristen Grauman,et al. Audio-Visual Floorplan Reconstruction , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[80] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[81] Andrea Vedaldi,et al. Localizing Visual Sounds the Hard Way , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).