Embodied Symbiotic Assistants that See, Act, Infer and Chat
暂无分享,去创建一个
Katerina Fragkiadaki | Xian Zhou | Shikhar Sharma | N. Gkanatsios | Ayush Jain | Gabriel H. Sarch | Yuchen Cao | Nilay Pande
[1] Katerina Fragkiadaki,et al. Analogy-Forming Transformers for Few-Shot 3D Parsing , 2023, ICLR.
[2] Dilek Z. Hakkani-Tür,et al. Alexa Arena: A User-Centric Interactive Platform for Embodied AI , 2023, ArXiv.
[3] Ludwig Schmidt,et al. LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.
[4] Li Dong,et al. Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks , 2022, ArXiv.
[5] Adam W. Harley,et al. TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors , 2022, ECCV.
[6] Adam W. Harley,et al. Simple-BEV: What Really Matters for Multi-Sensor BEV Perception? , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).
[7] Liunian Harold Li,et al. GLIPv2: Unifying Localization and Vision-Language Understanding , 2022, 2206.05836.
[8] Zirui Wang,et al. CoCa: Contrastive Captioners are Image-Text Foundation Models , 2022, Trans. Mach. Learn. Res..
[9] Katerina Fragkiadaki,et al. Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds , 2021, ECCV.
[10] Diego de Las Casas,et al. Improving language models by retrieving from trillions of tokens , 2021, ICML.
[11] Devendra Singh Chaplot,et al. FILM: Following Instructions in Language with Modular Methods , 2021, ICLR.
[12] Dilek Z. Hakkani-Tür,et al. TEACh: Task-driven Embodied Agents that Chat , 2021, AAAI.
[13] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.
[14] Alessandro Suglia,et al. Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion , 2021, ArXiv.
[15] Chi-Keung Tang,et al. Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation , 2021, NeurIPS.
[16] Roozbeh Mottaghi,et al. Visual Room Rearrangement , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Joshua B. Tenenbaum,et al. The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark Towards Physically Realistic Embodied AI , 2021, 2022 International Conference on Robotics and Automation (ICRA).
[18] Lyne P. Tchapmi,et al. iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[19] Roozbeh Mottaghi,et al. Rearrangement: A Challenge for Embodied AI , 2020, ArXiv.
[20] Santhosh K. Ramakrishnan,et al. Occupancy Anticipation for Efficient Exploration and Navigation , 2020, ECCV.
[21] Josh H. McDermott,et al. ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation , 2020, NeurIPS Datasets and Benchmarks.
[22] Ruslan Salakhutdinov,et al. Object Goal Navigation using Goal-Oriented Semantic Exploration , 2020, NeurIPS.
[23] Arjun Gupta,et al. Semantic Visual Navigation by Watching YouTube Videos , 2020, NeurIPS.
[24] Abhinav Gupta,et al. Semantic Curiosity for Active Visual Learning , 2020, ECCV.
[25] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[26] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[27] Ruslan Salakhutdinov,et al. Learning to Explore using Active Neural SLAM , 2020, ICLR.
[28] Luke Zettlemoyer,et al. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Ari S. Morcos,et al. DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames , 2019, ICLR.
[30] S. Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.
[31] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[32] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[33] Katerina Fragkiadaki,et al. Learning from Unlabelled Videos Using Contrastive Predictive Neural 3D Mapping , 2019, ICLR.
[34] Jitendra Malik,et al. Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[35] Tao Chen,et al. Learning Exploration Policies for Navigation , 2019, ICLR.
[36] Katerina Fragkiadaki,et al. Learning Spatial Common Sense With Geometry-Aware Recurrent Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Ali Farhadi,et al. Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Silvio Savarese,et al. SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark , 2018, CoRL.
[39] Ali Farhadi,et al. Visual Semantic Navigation using Scene Priors , 2018, ICLR.
[40] Jitendra Malik,et al. On Evaluation of Embodied Navigation Agents , 2018, ArXiv.
[41] Luke S. Zettlemoyer,et al. AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.
[42] Daniel L. K. Yamins,et al. Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.
[43] Ali Farhadi,et al. AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.
[44] Ali Farhadi,et al. IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[45] Georgia Gkioxari,et al. Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[46] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[47] Liang Wang,et al. Referring Expression Generation and Comprehension via Attributes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[48] Li Fei-Fei,et al. Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[49] Alvin Cheung,et al. Learning a Neural Semantic Parser from User Feedback , 2017, ACL.
[50] Trevor Darrell,et al. Learning to Reason: End-to-End Module Networks for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[51] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[52] Demis Hassabis,et al. Neural Episodic Control , 2017, ICML.
[53] Rahul Sukthankar,et al. Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.
[54] Sebastian Nowozin,et al. DeepCoder: Learning to Write Programs , 2016, ICLR.
[55] Quoc V. Le,et al. Learning a Natural Language Interface with Neural Programmer , 2016, ICLR.
[56] Chen Liang,et al. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision , 2016, ACL.
[57] Noah A. Smith,et al. Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs , 2016, CoNLL.
[58] Percy Liang,et al. Learning executable semantic parsers for natural language understanding , 2016, Commun. ACM.
[59] Amos Azaria,et al. Instructable Intelligent Personal Agent , 2016, AAAI.
[60] Dan Klein,et al. Learning to Compose Neural Networks for Question Answering , 2016, NAACL.
[61] Mirella Lapata,et al. Language to Logical Form with Neural Attention , 2016, ACL.
[62] Luke S. Zettlemoyer,et al. Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language , 2015, EMNLP.
[63] Ming-Wei Chang,et al. Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.
[64] Wei Xu,et al. End-to-end learning of semantic role labeling using recurrent neural networks , 2015, ACL.
[65] Jonathan Berant,et al. Building a Semantic Parser Overnight , 2015, ACL.
[66] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.
[67] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.
[68] N. McGlynn. Thinking fast and slow. , 2014, Australian veterinary journal.
[69] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[70] Tom M. Mitchell,et al. Joint Syntactic and Semantic Parsing with Combinatory Categorial Grammar , 2014, ACL.
[71] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[72] Andrew Chou,et al. Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.
[73] Ming-Wei Chang,et al. Driving Semantic Parsing from the World’s Response , 2010, CoNLL.
[74] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[75] Luke S. Zettlemoyer,et al. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.
[76] Raymond J. Mooney,et al. Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.
[77] John D. Burger,et al. Problems in Natural-Language Interface to DBMS With Examples From EUFID , 1983, ANLP.
[78] David L. Waltz,et al. An English language question answering system for a large relational database , 1978, CACM.
[79] Gary G. Hendrix,et al. Developing a natural language interface to complex data , 1977, TODS.
[80] F. B. Thompson,et al. REL: A Rapidly Extensible Language system , 1969, ACM '69.
[81] H. Eysenck,et al. Thinking movement and the creation of dance through numbers , 2006 .
[82] Adam W. Harley,et al. Move to See Better: Self-Improving Embodied Object Detection , 2021, BMVC.
[83] Ali Farhadi,et al. Learning Generalizable Visual Representations via Interactive Gameplay , 2021, ICLR.
[84] Luke S. Zettlemoyer,et al. Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.
[85] Alexander J. Smola,et al. Neural Information Processing Systems , 1997, NIPS 1997.
[86] I. Miyazaki,et al. AND T , 2022 .