Audio Visual Language Maps for Robot Navigation
暂无分享,去创建一个
[1] Krishna Murthy Jatavallabhula,et al. ConceptFusion: Open-set Multimodal 3D Mapping , 2023, Robotics: Science and Systems.
[2] Benjamin Elizalde,et al. CLAP: Learning Audio Concepts From Natural Language Supervision , 2022, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] T. Welschehold,et al. Catch Me if You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments With Moving Sounds , 2021, IEEE Robotics and Automation Letters.
[4] T. Funkhouser,et al. OpenScene: 3D Scene Understanding with Open Vocabularies , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] A. Roy-Chowdhury,et al. AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments , 2022, NeurIPS.
[6] Andy Zeng,et al. Visual Language Maps for Robot Navigation , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).
[7] Arthur D. Szlam,et al. CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory , 2022, Robotics: Science and Systems.
[8] Jessica Borja-Diaz,et al. Grounding Language with Visual Affordances over Unstructured Data , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).
[9] M. Ryoo,et al. Open-vocabulary Queryable Scene Representations for Real World Planning , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).
[10] J. Boedecker,et al. Latent Plans for Task-Agnostic Offline Reinforcement Learning , 2022, CoRL.
[11] Peter R. Florence,et al. Code as Policies: Language Model Programs for Embodied Control , 2022, ArXiv.
[12] S. Levine,et al. LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action , 2022, CoRL.
[13] Josh H. McDermott,et al. Finding Fallen Objects Via Asynchronous Audio-Visual Integration , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Oier Mees,et al. What Matters in Language Conditioned Robotic Imitation Learning Over Unstructured Data , 2022, IEEE Robotics and Automation Letters.
[15] Kilian Q. Weinberger,et al. Language-driven Semantic Segmentation , 2022, ICLR.
[16] W. Burgard,et al. CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks , 2021, IEEE Robotics and Automation Letters.
[17] J. Bello,et al. Wav2CLIP: Learning Robust Audio Representations from Clip , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Federico Raue,et al. Audioclip: Extending Clip to Image, Text and Audio , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Yin Cui,et al. Open-vocabulary Object Detection via Vision and Language Knowledge Distillation , 2021, ICLR.
[20] Ludwig Schmidt,et al. CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration , 2022, ArXiv.
[21] Dieter Fox,et al. CLIPort: What and Where Pathways for Robotic Manipulation , 2021, CoRL.
[22] Cordelia Schmid,et al. Airbert: In-domain Pretraining for Vision-and-Language Navigation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Angel X. Chang,et al. Habitat 2.0: Training Home Assistants to Rearrange their Habitat , 2021, NeurIPS.
[24] Yann LeCun,et al. MDETR - Modulated Detection for End-to-End Multi-Modal Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[25] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[26] Santhosh K. Ramakrishnan,et al. Learning to Set Waypoints for Audio-Visual Navigation , 2020, ICLR.
[27] Josh H. McDermott,et al. ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation , 2020, NeurIPS Datasets and Benchmarks.
[28] Golnaz Ghiasi,et al. Open-Vocabulary Image Segmentation , 2021, ArXiv.
[29] A. Linear-probe,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021 .
[30] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[31] Jacob Krantz,et al. Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments , 2020, ECCV.
[32] J. Tenenbaum,et al. Look, Listen, and Act: Towards Audio-Visual Embodied Navigation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).
[33] K. Grauman,et al. SoundSpaces: Audio-Visual Navigation in 3D Environments , 2019, ECCV.
[34] Tomasz Malisiewicz,et al. SuperGlue: Learning Feature Matching With Graph Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Jitendra Malik,et al. Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[36] Binbin Xu,et al. MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[37] Roland Siegwart,et al. From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Stefan Leutenegger,et al. Fusion++: Volumetric Object-Level SLAM , 2018, 2018 International Conference on 3D Vision (3DV).
[39] Dan Klein,et al. Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.
[40] Lourdes Agapito,et al. MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects , 2018, 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).
[41] Tomasz Malisiewicz,et al. SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[42] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[43] Tomás Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[44] Ali Farhadi,et al. AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.
[45] Matthias Nießner,et al. Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).
[46] Stefan Leutenegger,et al. SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[47] Karol J. Piczak. ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.
[48] Paul H. J. Kelly,et al. SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[49] Morgan Quigley,et al. ROS: an open-source Robot Operating System , 2009, ICRA 2009.
[50] Konrad Paul Kording,et al. Causal Inference in Multisensory Perception , 2007, PloS one.
[51] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.