论文信息 - Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions

Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions

The adoption of pre-trained language models to generate action plans for embodied agents is a promising research strategy. However, execution of instructions in real or simulated environments requires veriﬁcation of the feasibility of actions as well as their relevance to the completion of a goal. We propose a new method that combines a language model and reinforcement learning for the task of building objects in a Minecraft-like environment according to the natural language instructions. Our method ﬁrst generates a set of consistently achievable sub-goals from the instructions and then completes associated sub-tasks with a pre-trained RL policy. The proposed method formed the RL baseline at the IGLU 2022 competition.

[1] Human-guided Collaborative Problem Solving: A Natural Language based Framework , 2021 .

[2] M. Cakmak,et al. Following Natural Language Instructions for Household Tasks With Landmark Guided Search and Reinforced Pose Adjustment , 2022, IEEE Robotics and Automation Letters.

[3] Marc-Alexandre Côté,et al. IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents , 2022, ArXiv.

[4] Maartje ter Hoeve,et al. IGLU 2022: Interactive Grounded Language Understanding in a Collaborative Environment at NeurIPS 2022 , 2022, ArXiv.

[5] S. Levine,et al. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.

[6] Adrian S. Wong,et al. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language , 2022, ICLR.

[7] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.

[8] P. Abbeel,et al. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents , 2022, ICML.

[9] Devendra Singh Chaplot,et al. FILM: Following Instructions in Language with Modular Methods , 2021, ICLR.

[10] Maartje ter Hoeve,et al. Interactive Grounded Language Understanding in a Collaborative Environment: IGLU 2021 , 2022, NeurIPS.

[11] Cristian-Paul Bara,et al. MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks , 2021, EMNLP.

[12] Julia Hockenmaier,et al. Learning to execute instructions in a Minecraft dialogue , 2020, ACL.

[13] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[14] Julia Hockenmaier,et al. Collaborative Dialogue in Minecraft , 2019, ACL.

[15] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.