CHALET: Cornell House Agent Learning Environment

We present CHALET, a 3D house simulator with support for navigation and manipulation. CHALET includes 58 rooms and 10 house configuration, and allows to easily create new house and room layouts. CHALET supports a range of common household activities, including moving objects, toggling appliances, and placing objects inside closeable containers. The environment and actions available are designed to create a challenging domain to train and evaluate autonomous agents, including for tasks that combine language, vision, and planning in a dynamic environment.

[1]  Yoav Artzi,et al.  A Corpus of Natural Language for Visual Reasoning , 2017, ACL.

[2]  Stefan Lee,et al.  Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[4]  Ashutosh Saxena,et al.  Environment-Driven Lexicon Induction for High-Level Instructions , 2015, ACL.

[5]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[6]  Thomas A. Funkhouser,et al.  MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments , 2017, ArXiv.

[7]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[8]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[9]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Alejandro Hernández Cordero,et al.  Extending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS and Gazebo , 2016, ArXiv.

[11]  Yuandong Tian,et al.  Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.

[12]  Demis Hassabis,et al.  Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[13]  John Langford,et al.  Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.

[14]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[15]  Daniel Marcu,et al.  Natural Language Communication with Robots , 2016, NAACL.

[16]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Simon Brodeur,et al.  HoME: a Household Multimodal Environment , 2017, ICLR.

[18]  Ali Farhadi,et al.  IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Regina Barzilay,et al.  Representation Learning for Grounded Spatial Reasoning , 2017, TACL.

[20]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.