Autonomous Acquisition of Natural Situated Communication

An important part of human intelligence, both historically and operationally, is our ability to communicate. We learn how to communicate, and maintain our communicative skills, in a society of communicators – a highly effective way to reach and maintain proficiency in this complex skill. Principles that might allow artificial agents to learn language this way are in completely known at present – the multi-dimensional nature of socio-communicative skills are beyond every machine learning framework so far proposed. Our work begins to address the challenge of proposing a way for observation-based machine learning of natural language and communication. Our framework can learn complex communicative skills with minimal up-front knowledge. The system learns by incrementally producing predictive models of causal relationships in observed data, guided by goal-inference and reasoning using forward-inverse models. We present results from two experiments where our S1 agent learns human communication by observing two humans interacting in a realtime TV-style interview, using multimodal communicative gesture and situated language to talk about recycling of various materials and objects. S1 can learn multimodal complex language and multimodal communicative acts, a vocabulary of 100 words forming natural sentences with relatively complex sentence structure, including manual deictic reference and anaphora. S1 is seeded only with high-level information about goals of the interviewer and interviewee, and a small ontology; no grammar or other information is provided to S1 a priori. The agent learns the pragmatics, semantics, and syntax of complex utterances spoken and gestures from scratch, by observing the humans compare and contrast the cost and pollution related to recycling aluminum cans, glass bottles, newspaper, plastic, and wood. After 20 hours of observation S1 can perform an unscripted TV interview with a human, in the same style, without making mistakes.

[1]  Peter Ford Dominey,et al.  The Coordinating Role of Language in Real-Time Multimodal Learning of Cooperative Tasks , 2013, IEEE Transactions on Autonomous Mental Development.

[2]  Pei Wang,et al.  Rigid Flexibility: The Logic of Intelligence , 2006 .

[3]  B. Granström,et al.  NATURAL TURN-TAKING NEEDS NO MANUAL : COMPUTATIONAL THEORY AND MODEL , FROM PERCEPTION TO ACTION , 2002 .

[4]  Tamas Madl,et al.  LIDA: A Systems-level Architecture for Cognition, Emotion, and Learning , 2014, IEEE Transactions on Autonomous Mental Development.

[5]  Jürgen Schmidhuber,et al.  Resource-Bounded Machines are Motivated to be Effective, Efficient, and Curious , 2013, AGI.

[6]  Kristinn R. Thórisson,et al.  Towards a Programming Paradigm for Control Systems with High Levels of Existential Autonomy , 2013, AGI.

[7]  Ben Goertzel,et al.  A General Intelligence Oriented Architecture for Embodied Natural Language Processing , 2010, AGI 2010.

[8]  Giovanni Pezzulo,et al.  The “Interaction Engine”: A Common Pragmatic Competence Across Linguistic and Nonlinguistic Interactions , 2012, IEEE Transactions on Autonomous Mental Development.

[9]  Kristinn R. Thórisson,et al.  Achieving Artificial General Intelligence Through Peewee Granularity , 2009 .

[10]  Giovanni Pezzulo,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Motor Simulation via Coupled Internal Models Using Sequential Monte Carlo , 2022 .

[11]  W. McGrew An ethological study of children's behavior , 1972 .

[12]  Haris Dindo,et al.  A probabilistic approach to learning a visually grounded language model through human-robot interaction , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Kristinn R. Thórisson,et al.  On Attention Mechanisms for AGI Architectures: A Design Proposal , 2012, AGI.

[14]  Peter Ford Dominey,et al.  Developmental stages of perception and language acquisition in a perceptually grounded robot , 2005, Cognitive Systems Research.

[15]  John E. Laird,et al.  The Soar Cognitive Architecture , 2012 .

[16]  S. Brison The Intentional Stance , 1989 .

[17]  Kristinn R. Thórisson,et al.  Modeling Multimodal Communication as a Complex System , 2006, ZiF Workshop.

[18]  Pei Wang,et al.  THE ASSUMPTIONS ON KNOWLEDGE AND RESOURCES IN MODELS OF RATIONALITY , 2011 .

[19]  Jürgen Schmidhuber,et al.  Bounded Seed-AGI , 2014, AGI.

[20]  M S Magnusson,et al.  Discovering hidden time patterns in behavior: T-patterns and their detection , 2000, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[21]  Yiannis Demiris,et al.  Hierarchical attentive multiple models for execution and recognition of actions , 2006, Robotics Auton. Syst..

[22]  Kristinn R. Thórisson,et al.  Towards a General Attention Mechanism for Embedded Intelligent Systems , 2014 .

[23]  Kristinn R. Thórisson,et al.  Self-Programming: Operationalizing Autonomy , 2009 .

[24]  Jürgen Schmidhuber,et al.  Bounded Recursive Self-Improvement , 2013, ArXiv.

[25]  R. Sternberg The Psychology of Intelligence , 2002 .

[26]  Pei Wang,et al.  Toward a Unified Artificial Intelligence , 2004, AAAI Technical Report.

[27]  Yiannis Demiris,et al.  Contextual action recognition and target localization with an active allocation of attention on a humanoid robot , 2013, Bioinspiration & biomimetics.

[28]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[29]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[30]  Kristinn R. Thórisson,et al.  Holistic Intelligence: Transversal Skills & Current Methodologies , 2009 .