Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
暂无分享,去创建一个
[1] P. Abbeel,et al. On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning , 2022, 2206.03271.
[2] Ian S. Fischer,et al. Multi-Game Decision Transformers , 2022, NeurIPS.
[3] Trevor Darrell,et al. Voxel-informed Language Grounding , 2022, ACL.
[4] Sergio Gomez Colmenarejo,et al. A Generalist Agent , 2022, Trans. Mach. Learn. Res..
[5] Oier Mees,et al. What Matters in Language Conditioned Robotic Imitation Learning Over Unstructured Data , 2022, IEEE Robotics and Automation Letters.
[6] S. Levine,et al. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.
[7] P. Abbeel,et al. Coarse-to-Fine Q-attention with Learned Path Ranking , 2022, ArXiv.
[8] Adrian S. Wong,et al. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language , 2022, ICLR.
[9] Vikash Kumar,et al. R3M: A Universal Visual Representation for Robot Manipulation , 2022, ArXiv.
[10] Li Fei-Fei,et al. MetaMorph: Learning Universal Controllers with Transformers , 2022, International Conference on Learning Representations.
[11] Lei Zhang,et al. Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Stephen James,et al. Auto-Lambda: Disentangling Dynamic Task Relationships , 2022, Trans. Mach. Learn. Res..
[13] P. Abbeel,et al. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents , 2022, ICML.
[14] T. Müller,et al. Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..
[15] Benjamin Recht,et al. Plenoxels: Radiance Fields without Neural Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Vincent Sitzmann,et al. Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation , 2021, 2022 International Conference on Robotics and Automation (ICRA).
[17] W. Burgard,et al. CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks , 2021, IEEE Robotics and Automation Letters.
[18] Dieter Fox,et al. StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects , 2021, 2022 International Conference on Robotics and Automation (ICRA).
[19] G. Konidaris,et al. Towards Optimal Correlational Object Search , 2021, 2022 International Conference on Robotics and Automation (ICRA).
[20] David J. Fleet,et al. Pix2seq: A Language Modeling Framework for Object Detection , 2021, ICLR.
[21] Olivier J. H'enaff,et al. Perceiver IO: A General Architecture for Structured Inputs & Outputs , 2021, ICLR.
[22] Xiaolong Wang,et al. Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers , 2021, ICLR.
[23] Stephen James,et al. Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Stephen James,et al. Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation , 2021, IEEE Robotics and Automation Letters.
[25] Nan Rosemary Ke,et al. Coordination Among Neural Modules Through a Shared Global Workspace , 2021, ICLR.
[26] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ACM Comput. Surv..
[27] Zhanpeng He,et al. Universal Manipulation Policy Network for Articulated Objects , 2021, IEEE Robotics and Automation Letters.
[28] Sergey Levine,et al. BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning , 2022, CoRL.
[29] Learning Generalizable Vision-Tactile Robotic Grasping Strategy for Deformable Objects via Transformer , 2021, ArXiv.
[30] Maya Cakmak,et al. Assistive Tele-op: Leveraging Transformers to Collect Robotic Task Demonstrations , 2021, ArXiv.
[31] Jitendra Malik,et al. Differentiable Spatial Planning using Transformers , 2021, ICML.
[32] Steven K. Feiner,et al. Scene Editing as Teleoperation: A Case Study in 6DoF Kit Assembly , 2021, IEEE/RJS International Conference on Intelligent RObots and Systems.
[33] Vinay Uday Prabhu,et al. Multimodal datasets: misogyny, pornography, and malignant stereotypes , 2021, ArXiv.
[34] Dieter Fox,et al. CLIPort: What and Where Pathways for Robotic Manipulation , 2021, CoRL.
[35] D. Fox,et al. SORNet: Spatial Object-Centric Representations for Sequential Manipulation , 2021, CoRL.
[36] Minzhe Niu,et al. Voxel Transformer for 3D Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[37] S. Savarese,et al. Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation , 2021, CoRL.
[38] Jonathan Tompson,et al. Implicit Behavioral Cloning , 2021, CoRL.
[39] Yasuo Kuniyoshi,et al. Transformer-based deep imitation learning for dual-arm robot manipulation , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[40] Oriol Vinyals,et al. Highly accurate protein structure prediction with AlphaFold , 2021, Nature.
[41] Dieter Fox,et al. A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution , 2021, CoRL.
[42] Michael C. Yip,et al. Motion Planning Transformers: One Model to Plan Them All , 2021, ArXiv.
[43] Sergey Levine,et al. Offline Reinforcement Learning as One Big Sequence Modeling Problem , 2021, NeurIPS.
[44] Pieter Abbeel,et al. Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.
[45] Edward Johns,et al. Coarse-to-Fine Imitation Learning: Robot Manipulation from a Single Demonstration , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).
[46] Yann LeCun,et al. MDETR - Modulated Detection for End-to-End Multi-Modal Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[47] Mihir Prabhudesai,et al. CoCoNets: Continuous Contrastive 3D Scene Representations , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Patricio A. Vela,et al. A Joint Network for Grasp Detection Conditioned on Natural Language Commands , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).
[49] Vladlen Koltun,et al. Vision Transformers for Dense Prediction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[50] Andrew Zisserman,et al. Perceiver: General Perception with Iterative Attention , 2021, ICML.
[51] Emily M. Bender,et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.
[52] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[53] Joelle Pineau,et al. Multi-Task Reinforcement Learning with Context-based Representations , 2021, ICML.
[54] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[55] Matthew J. Hausknecht,et al. ALFWorld: Aligning Text and Embodied Environments for Interactive Learning , 2020, ICLR.
[56] Gregory D. Hager,et al. Guiding Multi-Step Rearrangement Tasks with Natural Language Instructions , 2021, CoRL.
[57] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[58] Mihir Prabhudesai,et al. 3D-OES: Viewpoint-Invariant Object-Factorized Environment Simulators , 2020, CoRL.
[59] Sudeep Dasari,et al. Transformers for One-Shot Visual Imitation , 2020, CoRL.
[60] Peter R. Florence,et al. Transporter Networks: Rearranging the Visual World for Robotic Manipulation , 2020, CoRL.
[61] Leslie Pack Kaelbling,et al. Integrated Task and Motion Planning , 2020, Annu. Rev. Control. Robotics Auton. Syst..
[62] Thomas Kipf,et al. Object-Centric Learning with Slot Attention , 2020, NeurIPS.
[63] Stefanie Tellex,et al. Robot Object Retrieval with Contextual Natural Language Queries , 2020, Robotics: Science and Systems.
[64] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[65] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[66] Pierre Sermanet,et al. Grounding Language in Play , 2020, ArXiv.
[67] Jacob Andreas,et al. Experience Grounds Language , 2020, EMNLP.
[68] Andy Zeng,et al. Grasping in the Wild: Learning 6DoF Closed-Loop Grasping From Low-Cost Demonstrations , 2019, IEEE Robotics and Automation Letters.
[69] Dieter Fox,et al. 6-DOF Grasping for Target-driven Object Manipulation in Clutter , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).
[70] P. Abbeel,et al. Learning to Manipulate Deformable Objects without Demonstrations , 2019, Robotics: Science and Systems.
[71] Andrew J. Davison,et al. RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.
[72] D. Fox,et al. Self-supervised 6D Object Pose Estimation for Robot Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).
[73] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[74] Alberto Rodriguez,et al. TossingBot: Learning to Throw Arbitrary Objects With Residual Physics , 2019, IEEE Transactions on Robotics.
[75] Ross B. Girshick,et al. Mask R-CNN , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[76] S. Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.
[77] D. Fox,et al. The Best of Both Modes: Separately Leveraging RGB and Depth for Unseen Object Instance Segmentation , 2019, CoRL.
[78] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[79] Andrew J. Davison,et al. PyRep: Bringing V-REP to Deep Robot Learning , 2019, ArXiv.
[80] Dieter Fox,et al. 6-DOF GraspNet: Variational Grasp Generation for Object Manipulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[81] Dieter Fox,et al. Prospection: Interpretable plans from language by predicting the future , 2019, 2019 International Conference on Robotics and Automation (ICRA).
[82] Gordon Wetzstein,et al. DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[83] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[84] Andrew Bennett,et al. Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction , 2018, EMNLP.
[85] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[86] Mohit Shridhar,et al. Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction , 2018, Robotics: Science and Systems.
[87] Leslie Pack Kaelbling,et al. From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning , 2018, J. Artif. Intell. Res..
[88] Dieter Fox,et al. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.
[89] Kuniyuki Takahashi,et al. Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[90] Ian Taylor,et al. Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[91] Aaron C. Courville,et al. FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.
[92] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[93] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[94] Kuan-Ting Yu,et al. Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[95] Daniel Marcu,et al. Natural Language Communication with Robots , 2016, NAACL.
[96] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[97] Kevin Lee,et al. Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..
[98] Peter Stone,et al. Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.
[99] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[100] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[101] James J. Gibson,et al. The Ecological Approach to Visual Perception: Classic Edition , 2014 .
[102] Kostas Daniilidis,et al. Single image 3D object detection and pose estimation for grasping , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).
[103] Luke S. Zettlemoyer,et al. Learning from Unscripted Deictic Gesture and Language for Human-Robot Interactions , 2014, AAAI.
[104] Matthias Nießner,et al. Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..
[105] Leslie Pack Kaelbling,et al. Integrated task and motion planning in belief space , 2013, Int. J. Robotics Res..
[106] Surya P. N. Singh,et al. V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[107] Stefanie Tellex,et al. Interpreting and Executing Recipes with a Cooking Robot , 2012, ISER.
[108] Matthew R. Walter,et al. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.
[109] Fu Jie Huang,et al. A Tutorial on Energy-Based Learning , 2006 .
[110] Hans P. Moravec. Robot spatial perception by stereoscopic vision and 3D evidence grids , 1996 .
[111] R A Brooks,et al. New Approaches to Robotics , 1991, Science.
[112] Ramesh C. Jain,et al. Building an environment model using depth information , 1989, Computer.