SoundSpaces: Audio-Visual Navigation in 3D Environments
暂无分享,去创建一个
Kristen Grauman | Carl Schissler | Vamsi Krishna Ithapu | Changan Chen | Unnat Jain | Sebastia Vicenc Amengual Gari | Ziad Al-Halah | Philip Robinson
[1] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[2] Iliyan Georgiev,et al. Implementing Vertex Connection and Merging , 2013 .
[3] Christos Dimitrakakis,et al. TORCS, The Open Racing Car Simulator , 2005 .
[4] Martina S. Ragettli,et al. Differences between Outdoor and Indoor Sound Levels for Open, Tilted, and Closed Windows , 2018, International journal of environmental research and public health.
[5] Kristen Grauman,et al. Co-Separating Sounds of Visual Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[6] E. Tolman. Cognitive maps in rats and men. , 1948, Psychological review.
[7] Nuno Vasconcelos,et al. Self-Supervised Generation of Spatial Audio for 360 Video , 2018, NIPS 2018.
[8] Radu Horaud,et al. Accounting for Room Acoustics in Audio-Visual Multi-Speaker Tracking , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Ruslan Salakhutdinov,et al. Learning to Explore using Active Neural SLAM , 2020, ICLR.
[10] Stefan Lee,et al. Decentralized Distributed PPO: Mastering PointGoal Navigation , 2020, ICLR 2020.
[11] Ali Farhadi,et al. AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.
[12] Jaime Sánchez,et al. Development of an audio-based virtual gaming environment to assist with navigation skills in the blind. , 2013, Journal of visualized experiments : JoVE.
[13] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[14] Michael Goesele,et al. The Replica Dataset: A Digital Replica of Indoor Spaces , 2019, ArXiv.
[15] Chuang Gan,et al. Look, Listen, and Act: Towards Audio-Visual Embodied Navigation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).
[16] Hiroshi G. Okuno,et al. Automatic speech recognition improved by two-layered audio-visual integration for robot audition , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.
[17] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Daniela Massiceti,et al. Stereosonic vision: Exploring visual-to-auditory sensory substitution mappings in an immersive virtual reality navigation paradigm , 2018, PloS one.
[19] Stefan Lee,et al. Embodied Question Answering in Photorealistic Environments With Point Cloud Perception , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Vladlen Koltun,et al. Benchmarking Classic and Learned Navigation in Complex 3D Environments , 2019, ArXiv.
[21] Yoav Artzi,et al. TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Franco Lepore,et al. Early- and Late-Onset Blind Individuals Show Supra-Normal Auditory Abilities in Far-Space , 2004, Current Biology.
[23] Arne D. Ekstrom,et al. Why vision is important to how we navigate , 2015, Hippocampus.
[24] Lauri Savioja,et al. Overview of geometrical room acoustic modeling techniques. , 2015, The Journal of the Acoustical Society of America.
[25] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.
[26] Stefan Lee,et al. Neural Modular Control for Embodied Question Answering , 2018, CoRL.
[27] Patrick A. Naylor,et al. Acoustic SLAM , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[28] Antonio Torralba,et al. See, Hear, and Read: Deep Aligned Representations , 2017, ArXiv.
[29] Chen Fang,et al. Visual to Sound: Generating Natural Sound for Videos in the Wild , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[30] R. Zatorre,et al. A Functional Neuroimaging Study of Sound Localization: Visual Cortex Activity Predicts Performance in Early-Blind Individuals , 2005, PLoS biology.
[31] Jana Kosecka,et al. A dataset for developing and benchmarking active vision , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[32] Simon Brodeur,et al. HoME: a Household Multimodal Environment , 2017, ICLR.
[33] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.
[34] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[35] Tao Chen,et al. Learning Exploration Policies for Navigation , 2019, ICLR.
[36] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[37] Kazuhiro Nakadai,et al. Sound Source Localization and Separation , 2015 .
[38] Radu Horaud,et al. Vision-guided robot hearing , 2013, Int. J. Robotics Res..
[39] Stephen R. Clark,et al. Probing Emergent Semantics in Predictive Agents via Question Answering , 2020, ICML.
[40] Dhruv Batra,et al. SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[41] Ali Farhadi,et al. Visual Semantic Planning Using Deep Successor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[42] L. Merabet,et al. Neural reorganization following sensory loss: the opportunity of change , 2010, Nature Reviews Neuroscience.
[43] Silvio Savarese,et al. Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.
[44] Yangsheng Xu,et al. Surveillance Robot Utilizing Video and Audio Information , 2009, J. Intell. Robotic Syst..
[45] Ali Farhadi,et al. IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[46] Ruslan Salakhutdinov,et al. Learning To Explore Using Active Neural Mapping , 2020, ICLR 2020.
[47] Ali Farhadi,et al. Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[49] S. Hillyard,et al. Improved auditory spatial tuning in blind humans , 1999, Nature.
[50] Kristen Grauman,et al. VisualEchoes: Spatial Image Representation Learning through Echolocation , 2020, ECCV.
[51] Chenliang Xu,et al. Deep Cross-Modal Audio-Visual Generation , 2017, ACM Multimedia.
[52] Abraham Woubie,et al. Do Autonomous Agents Benefit from Hearing? , 2019, ArXiv.
[53] Hiroaki Kitano,et al. Epipolar geometry based sound localization and extraction for humanoid audition , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).
[54] M. Paré,et al. Early-blind human subjects localize sound sources better than sighted subjects , 1998, Nature.
[55] Silvio Savarese,et al. Interactive Gibson: A Benchmark for Interactive Navigation in Cluttered Environments , 2019, ArXiv.
[56] Norman I. Badler,et al. Sound localization and multi-modal steering for autonomous virtual agents , 2014, I3D.
[57] Boaz Rafaely,et al. Fundamentals of Spherical Array Processing , 2015, Springer Topics in Signal Processing.
[58] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[59] Ali Farhadi,et al. A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks , 2020, ECCV.
[60] Kristen Grauman,et al. End-to-End Policy Learning for Active Visual Categorization , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[61] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[62] Jitendra Malik,et al. Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[63] K. Heinrich Kuttruff,et al. Auralization of Impulse Responses Modeled on the Basis of Ray-Tracing Results , 1993 .
[64] Kristen Grauman,et al. 2.5D Visual Sound , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[65] Lotfi B. Merabet,et al. Audio-Based Navigation Using Virtual Environments: Combining Technology and Neuroscience , 2009 .
[66] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[67] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[68] Dinesh Manocha,et al. Interactive sound propagation with bidirectional path tracing , 2016, ACM Trans. Graph..
[69] Antonio Torralba,et al. Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[70] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[71] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.
[72] Jitendra Malik,et al. Unifying Map and Landmark Based Representations for Visual Navigation , 2017, ArXiv.
[73] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).
[74] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[75] Abhinav Gupta,et al. PyRobot: An Open-source Robotics Framework for Research and Benchmarking , 2019, ArXiv.
[76] Stefan Lee,et al. Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[77] Rob Fergus,et al. MazeBase: A Sandbox for Learning from Games , 2015, ArXiv.
[78] Leonidas J. Guibas,et al. Bidirectional Estimators for Light Transport , 1995 .
[79] Radu Horaud,et al. Exploiting the Complementarity of Audio and Visual Data in Multi-speaker Tracking , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).
[80] Hiroaki Kitano,et al. Active Audition for Humanoid , 2000, AAAI/IAAI.
[81] Matthias Nießner,et al. Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).
[82] Yangsheng Xu,et al. A Learning Based Approach to Audio Surveillance in Household Environment , 2006, Int. J. Inf. Acquis..
[83] C. Thinus-Blanc,et al. Representation of space in blind persons: vision as a spatial sense? , 1997, Psychological bulletin.
[84] Rob Fergus,et al. Learning Physical Intuition of Block Towers by Example , 2016, ICML.
[85] Jerome Daniel,et al. Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format , 2003 .
[86] Jitendra Malik,et al. Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[87] Ali Farhadi,et al. Two Body Problem: Collaborative Visual Task Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[88] Franz Zotter,et al. Ambisonics: A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality , 2019 .
[89] Yuandong Tian,et al. Bayesian Relational Memory for Semantic Visual Navigation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[90] Massimo Bergamasco,et al. The Design and Evaluation of a Computer Game for the Blind in the GRAB Haptic Audio Virtual Environment , 2003 .
[91] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[92] Jitendra Malik,et al. On Evaluation of Embodied Navigation Agents , 2018, ArXiv.
[93] Andrea Vedaldi,et al. MapNet: An Allocentric Spatial Memory for Mapping Environments , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[94] Rick Kazman,et al. Using 3D sound as a navigational aid in virtual environments , 2004, Behav. Inf. Technol..
[95] Bernhard P. Wrobel,et al. Multiple View Geometry in Computer Vision , 2001 .
[96] Adrián Romero-Garcés,et al. Audio-Visual Perception System for a Humanoid Robotic Head , 2014, Sensors.
[97] Katja Hofmann,et al. The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.
[98] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[99] Franco Lepore,et al. Wayfinding in the blind: larger hippocampal volume and supranormal spatial navigation. , 2008, Brain : a journal of neurology.
[100] Vladlen Koltun,et al. Semi-parametric Topological Memory for Navigation , 2018, ICLR.
[101] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[102] Radu Horaud,et al. Tracking the Active Speaker Based on a Joint Audio-Visual Observation Model , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).
[103] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[104] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[105] Michel Denis,et al. Exploration of architectural spaces by blind people using auditory virtual reality for the construction of spatial knowledge , 2014, Int. J. Hum. Comput. Stud..
[106] Jia Deng,et al. To Learn or Not to Learn: Analyzing the Role of Learning for Navigation in Virtual Environments , 2019, ArXiv.
[107] H. Sabine. Room Acoustics , 1953, The SAGE Encyclopedia of Human Communication Sciences and Disorders.
[108] Joseph M. Romano,et al. ROS open-source audio recognizer: ROAR environmental sound detection tools for robot programming , 2013, Autonomous Robots.
[109] Rahul Sukthankar,et al. Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.
[110] Iván V. Meza,et al. Localization of sound sources in robotics: A review , 2017, Robotics Auton. Syst..
[111] Robert Höldrich,et al. Binaural rendering of Ambisonic signals by head-related impulse response time alignment and a diffuseness constraint. , 2018, The Journal of the Acoustical Society of America.
[112] Subramanian Ramanathan,et al. SALSA: A Novel Dataset for Multimodal Group Behavior Analysis , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[113] Yonatan Bisk,et al. Shifting the Baseline: Single Modality Performance on Visual Navigation & QA , 2018, NAACL.