VIMA: Robot Manipulation with Multimodal Prompts
暂无分享,去创建一个
Anima Anandkumar | Linxi (Jim) Fan | Agrim Gupta | Yuke Zhu | Guanzhi Wang | Yunfan Jiang | Yongqiang Dou | Yanjun Chen | Fei-Fei Li | Zichen Zhang
[1] Yuke Zhu,et al. Voyager: An Open-Ended Embodied Agent with Large Language Models , 2023, Trans. Mach. Learn. Res..
[2] Yuchen Lu,et al. Hyper-Decision Transformer for Efficient Online Policy Adaptation , 2023, ICLR.
[3] Ross B. Girshick,et al. Segment Anything , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[4] Yecheng Jason Ma,et al. Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? , 2023, NeurIPS.
[5] Luca Weihs,et al. When Learning Is Out of Reach, Reset: Generalization in Autonomous Visuomotor Reinforcement Learning , 2023, ArXiv.
[6] P. Abbeel,et al. Foundation Models for Decision Making: Problems, Methods, and Opportunities , 2023, ArXiv.
[7] Mehdi S. M. Sajjadi,et al. PaLM-E: An Embodied Multimodal Language Model , 2023, ICML.
[8] Karol Hausman,et al. Open-World Object Manipulation using Pre-trained Vision-Language Models , 2023, ArXiv.
[9] Karol Hausman,et al. Scaling Robot Learning with Semantically Imagined Experience , 2023, Robotics: Science and Systems.
[10] Animesh Garg,et al. Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments , 2023, IEEE Robotics and Automation Letters.
[11] A. Rajeswaran,et al. On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline , 2022, ICML.
[12] P. Abbeel,et al. Masked Autoencoding for Scalable and Generalizable Decision Making , 2022, NeurIPS.
[13] P. Stone,et al. VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors , 2022, ArXiv.
[14] Lerrel Pinto,et al. From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data , 2022, ICLR.
[15] P. Abbeel,et al. Real-World Robot Learning with Masked Visual Pre-training , 2022, CoRL.
[16] Yecheng Jason Ma,et al. VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training , 2022, ICLR.
[17] D. Fox,et al. Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation , 2022, CoRL.
[18] Li Dong,et al. Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks , 2022, ArXiv.
[19] Luis F. C. Figueredo,et al. LATTE: LAnguage Trajectory TransformEr , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).
[20] Peter R. Florence,et al. Inner Monologue: Embodied Reasoning through Planning with Language Models , 2022, CoRL.
[21] S. Levine,et al. LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action , 2022, CoRL.
[22] J. Tenenbaum,et al. Prompting Decision Transformer for Few-Shot Policy Generalization , 2022, ICML.
[23] J. Clune,et al. Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos , 2022, NeurIPS.
[24] Lerrel Pinto,et al. Behavior Transformers: Cloning k modes with one stone , 2022, NeurIPS.
[25] Aniruddha Kembhavi,et al. Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks , 2022, ICLR.
[26] Anima Anandkumar,et al. MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge , 2022, NeurIPS.
[27] David J. Fleet,et al. A Unified Sequence Interface for Vision Tasks , 2022, NeurIPS.
[28] Ali Farhadi,et al. ProcTHOR: Large-Scale Embodied AI Using Procedural Generation , 2022, NeurIPS.
[29] Juan Carlos Niebles,et al. Revisiting the “Video” in Video-Language Understanding , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] André Susano Pinto,et al. UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes , 2022, NeurIPS.
[31] Thomas Kipf,et al. Simple Open-Vocabulary Object Detection with Vision Transformers , 2022, ArXiv.
[32] Sergio Gomez Colmenarejo,et al. A Generalist Agent , 2022, Trans. Mach. Learn. Res..
[33] N. Codella,et al. i-Code: An Integrative and Composable Multimodal Learning Framework , 2022, AAAI.
[34] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.
[35] Vincent Vanhoucke,et al. Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items , 2022, 2022 International Conference on Robotics and Automation (ICRA).
[36] Hyung Won Chung,et al. What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? , 2022, ICML.
[37] S. Levine,et al. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.
[38] Adrian S. Wong,et al. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language , 2022, ICLR.
[39] Vikash Kumar,et al. R3M: A Universal Visual Representation for Robot Manipulation , 2022, CoRL.
[40] Li Fei-Fei,et al. MetaMorph: Learning Universal Controllers with Transformers , 2022, International Conference on Learning Representations.
[41] Ilija Radosavovic,et al. Masked Visual Pre-training for Motor Control , 2022, ArXiv.
[42] Amy Zhang,et al. Online Decision Transformer , 2022, ICML.
[43] Jingren Zhou,et al. OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework , 2022, ICML.
[44] A. Torralba,et al. Pre-Trained Language Models for Interactive Decision-Making , 2022, NeurIPS.
[45] S. Gu,et al. Can Wikipedia Help Offline Reinforcement Learning? , 2022, ArXiv.
[46] P. Abbeel,et al. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents , 2022, ICML.
[47] Yejin Choi,et al. MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] W. Burgard,et al. CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks , 2021, IEEE Robotics and Automation Letters.
[49] Tsu-Jui Fu,et al. VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling , 2021, ArXiv.
[50] Lu Yuan,et al. Florence: A New Foundation Model for Computer Vision , 2021, ArXiv.
[51] R. Mottaghi,et al. Simple but Effective: CLIP Embeddings for Embodied AI , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Ross B. Girshick,et al. Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Shubham Tulsiani,et al. A Differentiable Recipe for Learning Visual Non-Prehensile Planar Manipulation , 2021, CoRL.
[54] P. Abbeel,et al. Towards More Generalizable One-shot Visual Imitation Learning , 2021, 2022 International Conference on Robotics and Automation (ICRA).
[55] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[56] Dieter Fox,et al. CLIPort: What and Where Pathways for Robotic Manipulation , 2021, CoRL.
[57] David J. Fleet,et al. Pix2seq: A Language Modeling Framework for Object Detection , 2021, ICLR.
[58] Stefan Schaal,et al. Multi-Task Learning with Sequence-Conditioned Transporter Networks , 2021, 2022 International Conference on Robotics and Automation (ICRA).
[59] Quoc V. Le,et al. Multi-Task Self-Training for Learning General Representations , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[60] Angela P. Schoellig,et al. Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning , 2021, Annu. Rev. Control. Robotics Auton. Syst..
[61] Silvio Savarese,et al. What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , 2021, CoRL.
[62] Silvio Savarese,et al. BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments , 2021, CoRL.
[63] Silvio Savarese,et al. iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks , 2021, CoRL.
[64] Olivier J. H'enaff,et al. Perceiver IO: A General Architecture for Structured Inputs & Outputs , 2021, ICLR.
[65] Angel X. Chang,et al. Habitat 2.0: Training Home Assistants to Rearrange their Habitat , 2021, NeurIPS.
[66] Oriol Vinyals,et al. Multimodal Few-Shot Learning with Frozen Language Models , 2021, NeurIPS.
[67] Li Fei-Fei,et al. SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies , 2021, ICML.
[68] Xipeng Qiu,et al. A Survey of Transformers , 2021, AI Open.
[69] Ali Farhadi,et al. MERLOT: Multimodal Neural Script Knowledge Models , 2021, NeurIPS.
[70] Sergey Levine,et al. Offline Reinforcement Learning as One Big Sequence Modeling Problem , 2021, NeurIPS.
[71] Pieter Abbeel,et al. Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.
[72] Doina Precup,et al. AndroidEnv: A Reinforcement Learning Platform for Android , 2021, ArXiv.
[73] Minghao Gou,et al. OCRTOC: A Cloud-Based Competition and Benchmark for Robotic Grasping and Manipulation , 2021, IEEE Robotics and Automation Letters.
[74] Roozbeh Mottaghi,et al. ManipulaTHOR: A Framework for Visual Object Manipulation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[75] Roozbeh Mottaghi,et al. Visual Room Rearrangement , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[76] Joshua B. Tenenbaum,et al. The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark Towards Physically Realistic Embodied AI , 2021, 2022 International Conference on Robotics and Automation (ICRA).
[77] Cheston Tan,et al. A Survey of Embodied AI: From Simulators to Research Tasks , 2021, IEEE Transactions on Emerging Topics in Computational Intelligence.
[78] Andrew Zisserman,et al. Perceiver: General Perception with Iterative Attention , 2021, ICML.
[79] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[80] Fahad Shahbaz Khan,et al. Transformers in Vision: A Survey , 2021, ACM Comput. Surv..
[81] Felix Hill,et al. Imitating Interactive Intelligence , 2020, ArXiv.
[82] Lyne P. Tchapmi,et al. iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[83] Sudeep Dasari,et al. Transformers for One-Shot Visual Imitation , 2020, CoRL.
[84] Roozbeh Mottaghi,et al. Rearrangement: A Challenge for Embodied AI , 2020, ArXiv.
[85] Stuart J. Russell,et al. The MAGICAL Benchmark for Robust Imitation , 2020, NeurIPS.
[86] Brijen Thananjeyan,et al. Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones , 2020, IEEE Robotics and Automation Letters.
[87] Nicholas Rhinehart,et al. Conservative Safety Critics for Exploration , 2020, ICLR.
[88] Peter R. Florence,et al. Transporter Networks: Rearranging the Visual World for Robotic Manipulation , 2020, CoRL.
[89] Sehoon Ha,et al. Learning to be Safe: Deep RL with a Safety Critic , 2020, ArXiv.
[90] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[91] Chitta Baral,et al. Language-Conditioned Imitation Learning for Robot Manipulation Tasks , 2020, NeurIPS.
[92] Yoshua Bengio,et al. CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning , 2020, ICLR.
[93] Yuke Zhu,et al. robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.
[94] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ACM Comput. Surv..
[95] Stephen Clark,et al. Grounded Language Learning Fast and Slow , 2020, ICLR.
[96] Khashayar Rohanimanesh,et al. Self-Supervised Goal-Conditioned Pick and Place , 2020, ArXiv.
[97] Torsten Kröger,et al. Self-Supervised Learning for Precise Pick-and-Place Without Object Model , 2020, IEEE Robotics and Automation Letters.
[98] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[99] Corey Lynch,et al. Language Conditioned Imitation Learning Over Unstructured Data , 2020, Robotics: Science and Systems.
[100] Noam Shazeer,et al. GLU Variants Improve Transformer , 2020, ArXiv.
[101] S. Gelly,et al. Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.
[102] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.
[103] Andy Zeng,et al. Grasping in the Wild: Learning 6DoF Closed-Loop Grasping From Low-Cost Demonstrations , 2019, IEEE Robotics and Automation Letters.
[104] Marcus Rohrbach,et al. 12-in-1: Multi-Task Vision and Language Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[105] Luke Zettlemoyer,et al. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[106] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[107] Sergey Levine,et al. Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning , 2019, CoRL.
[108] S. Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.
[109] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[110] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[111] Silvio Savarese,et al. SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning , 2019, ArXiv.
[112] Andrew J. Davison,et al. RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.
[113] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[114] Russ Tedrake,et al. Self-Supervised Correspondence in Visuomotor Policy Learning , 2019, IEEE Robotics and Automation Letters.
[115] Juan Carlos Niebles,et al. Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[116] Oliver Kroemer,et al. Graph-Structured Visual Imitation , 2019, CoRL.
[117] Jitendra Malik,et al. Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[118] Jitendra Malik,et al. Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies , 2018 .
[119] Silvio Savarese,et al. SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark , 2018, CoRL.
[120] Sergio Gomez Colmenarejo,et al. One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL , 2018, ArXiv.
[121] Richard Socher,et al. The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.
[122] Sanja Fidler,et al. VirtualHome: Simulating Household Activities Via Programs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[123] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[124] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.
[125] Percy Liang,et al. World of Bits: An Open-Domain Platform for Web-Based Agents , 2017, ICML.
[126] Demis Hassabis,et al. Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.
[127] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[128] Marcin Andrychowicz,et al. One-Shot Imitation Learning , 2017, NIPS.
[129] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[130] Iasonas Kokkinos,et al. UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[131] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[132] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[133] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[134] Avinash C. Kak,et al. Real-time tracking and pose estimation for industrial objects using geometric features , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).
[135] E. Markman,et al. Word learning in children: an examination of fast mapping. , 1987, Child development.
[136] C. K. Liu,et al. BEHAVIOR-1K: A Benchmark for Embodied AI with 1, 000 Everyday Activities and Realistic Simulation , 2022, CoRL.
[137] P. Abbeel,et al. Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models , 2022, ArXiv.
[138] Max Jaderberg,et al. Open-Ended Learning Leads to Generally Capable Agents , 2021, ArXiv.
[139] David Howard,et al. A Review of Physics Simulators for Robotic Applications , 2021, IEEE Access.
[140] Pulkit Agrawal,et al. The Task Specification Problem , 2021, CoRL.
[141] Gregory D. Hager,et al. Guiding Multi-Step Rearrangement Tasks with Natural Language Instructions , 2021, CoRL.
[142] Yejin Choi,et al. Multimodal Neural Script Knowledge Models , 2021 .
[143] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[144] J. Charles,et al. A Sino-German λ 6 cm polarization survey of the Galactic plane I . Survey strategy and results for the first survey region , 2006 .
[145] Sonia Chernova,et al. Recent Advances in Robot Learning from Demonstration , 2020, Annu. Rev. Control. Robotics Auton. Syst..