MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks

An ideal integration of autonomous agents in a human world implies that they are able to collaborate on human terms. In particular, theory of mind plays an important role in maintaining common ground during human collaboration and communication. To enable theory of mind modeling in situated interactions, we introduce a fine-grained dataset of collaborative tasks performed by pairs of human subjects in the 3D virtual blocks world of Minecraft. It provides information that captures partners’ beliefs of the world and of each other as an interaction unfolds, bringing abundant opportunities to study human collaborative behaviors in situated language communication. As a first step towards our goal of developing embodied AI agents able to infer belief states of collaborative partners in situ, we build and present results on computational models for several theory of mind tasks.

[1]  Sara Kiesler,et al.  Common Ground in Dialogue with a Gendered Humanoid Robot , 2005 .

[2]  Shaohua Yang,et al.  Language to Action: Towards Interactive Task Learning with Physical Agents , 2018, IJCAI.

[3]  Akiko Aizawa,et al.  An Annotated Corpus of Reference Resolution for Interpreting Common Grounding , 2019, AAAI.

[4]  Sita Popat,et al.  Creating common ground: dialogues between performance and digital technologies , 2005 .

[5]  Weiyan Shi,et al.  Towards Socially Intelligent Agents with Mental State Transition and Human Utility , 2021, ArXiv.

[6]  David DeVault,et al.  PentoRef: A Corpus of Spoken References in Task-oriented Dialogues , 2016, LREC.

[7]  Daniel Marcu,et al.  Learning Interpretable Spatial Operations in a Rich 3D Blocks World , 2017, AAAI.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  D. Premack,et al.  Does the chimpanzee have a theory of mind? , 1978, Behavioral and Brain Sciences.

[10]  Akiko Aizawa,et al.  A Natural Language Corpus of Common Grounding under Continuous and Partially-Observable Context , 2019, AAAI.

[11]  Peter Stone,et al.  Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog , 2020, J. Artif. Intell. Res..

[12]  Helge J. Ritter,et al.  Multi-modal human-machine communication for instructing robot grasping tasks , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Song-Chun Zhu,et al.  CX-ToM: Counterfactual explanations with theory-of-mind for enhancing human trust in image recognition models , 2021, iScience.

[14]  T. C. Nicholas Graham,et al.  Enhancing Communication and Awareness in Asymmetric Games , 2019, ICEC-JCSG.

[15]  John E. Laird,et al.  A Standard Model of the Mind: Toward a Common Computational Framework across Artificial Intelligence, Cognitive Science, Neuroscience, and Robotics , 2017, AI Mag..

[16]  Changsong Liu,et al.  Modeling Collaborative Referring for Situated Referential Grounding , 2013, SIGDIAL Conference.

[17]  Julia Hockenmaier,et al.  Learning to execute instructions in a Minecraft dialogue , 2020, ACL.

[18]  Julia Hockenmaier,et al.  Collaborative Dialogue in Minecraft , 2019, ACL.

[19]  David A. Joyner,et al.  Towards Mutual Theory of Mind in Human-AI Interaction: How Language Reflects What Students Perceive About a Virtual Teaching Assistant , 2021, CHI.

[20]  Devi Parikh,et al.  It Takes Two to Tango: Towards Theory of AI's Mind , 2017, ArXiv.

[21]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[22]  Changsong Liu,et al.  Collaborative Effort towards Common Ground in Situated Human-Robot Dialogue , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[23]  Changsong Liu,et al.  Towards Mediating Shared Perceptual Basis in Situated Dialogue , 2012, SIGDIAL Conference.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Julian Jara-Ettinger,et al.  Theory of mind as inverse reinforcement learning , 2019, Current Opinion in Behavioral Sciences.

[26]  付伶俐 打磨Using Language,倡导新理念 , 2014 .

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  H. Francis Song,et al.  Machine Theory of Mind , 2018, ICML.

[29]  Yoav Artzi,et al.  Executing Instructions in Situated Collaborative Interactions , 2019, EMNLP.

[30]  Terry Winograd,et al.  Understanding natural language , 1974 .