Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
暂无分享,去创建一个
[1] Dinesh Manocha,et al. Reflection-Aware Sound Source Localization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[2] Gregory D. Hager,et al. Uncertainty-Aware Occupancy Map Prediction Using Generative Networks for Robot Navigation , 2019, 2019 International Conference on Robotics and Automation (ICRA).
[3] Jing Xiao,et al. Navigating Dynamically Unknown Environments Leveraging Past Experience , 2019, 2019 International Conference on Robotics and Automation (ICRA).
[4] Michael Elad,et al. Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[5] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.
[6] Konrad Paul Kording,et al. Causal Inference in Multisensory Perception , 2007, PloS one.
[7] Samarth Brahmbhatt,et al. DeepNav: Learning to Navigate Large Cities , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[9] Ali Farhadi,et al. Visual Semantic Navigation using Scene Priors , 2018, ICLR.
[10] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.
[11] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.
[12] Chuang Gan,et al. Self-Supervised Moving Vehicle Tracking With Stereo Sound , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Gert Kootstra,et al. International Conference on Robotics and Automation (ICRA) , 2008, ICRA 2008.
[14] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[15] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[16] Chuang Gan,et al. Self-supervised Audio-visual Co-segmentation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Hans P. Moravec. Sensor Fusion in Certainty Grids for Mobile Robots , 1988, AI Mag..
[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Trevor Darrell,et al. Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.
[20] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[21] Wei Xu,et al. Interactive Grounded Language Acquisition and Generalization in a 2D World , 2018, ICLR.
[22] Kristen Grauman,et al. Audio-Visual Embodied Navigation , 2019, ArXiv.
[23] Yoav Y. Schechner,et al. Harmony in Motion , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[24] Matthew R. Walter,et al. Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.
[25] Jitendra Malik,et al. On Evaluation of Embodied Navigation Agents , 2018, ArXiv.
[26] Jitendra Malik,et al. Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[27] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[28] Rahul Sukthankar,et al. Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.
[29] Ali Farhadi,et al. AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.
[30] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[31] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[32] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[33] Harry L. Van Trees,et al. Optimum Array Processing: Part IV of Detection, Estimation, and Modulation Theory , 2002 .
[34] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[35] Francis M. Boland,et al. Efficient Encoding and Decoding of Binaural Sound with Resonance Audio , 2019 .
[36] Vladlen Koltun,et al. Semi-parametric Topological Memory for Navigation , 2018, ICLR.
[37] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[38] Yuandong Tian,et al. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.
[39] Dan Klein,et al. Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.
[40] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[41] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[42] Mubarak Shah,et al. Multimodal Analysis for Identification and Segmentation of Moving-Sounding Objects , 2013, IEEE Transactions on Multimedia.
[43] Jana Kosecka,et al. Visual Representations for Semantic Target Driven Navigation , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[44] Jason Weston,et al. Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.
[45] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[46] Alessio Del Bue,et al. Seeing the Sound: A New Multimodal Imaging Device for Computer Vision , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).