Robots That Learn Language: Developmental Approach to Human-Machine Conversations

This paper describes a machine learning method that enables robots to learn the capability of linguistic communication from scratch through verbal and nonverbal interaction with users. The method focuses on two major problems that should be pursued to realize natural human-machine conversation: a scalable grounded symbol system and belief sharing. The learning is performed in the process of joint perception and joint action with a user. The method enables the robot to learn beliefs for communication by combining speech, visual, and behavioral reinforcement information in a probabilistic framework. The beliefs learned include speech units like phonemes or syllables, a lexicon, grammar, and pragmatic knowledge, and they are integrated in a system represented by a dynamical graphical model. The method also enables the user and the robot to infer the state of each other's beliefs related to communication. To facilitate such inference, the belief system held by the robot possesses a structure that represents the assumption of shared beliefs and allows for fast and robust adaptation of it through communication with the user. This adaptive behavior of the belief systems is modeled by the structural coupling of the belief systems held by the robot and the user, and it is performed through incremental online optimization in the process of interaction. Experimental results reveal that through a practical, small number of learning episodes with a user, the robot was eventually able to understand even fragmental and ambiguous utterances, act upon them, and generate utterances appropriate for the given situation. This work discusses the importance of properly handling the risk of being misunderstood in order to facilitate mutual understanding and to keep the coupling effective.

[1]  R. Wallace The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason , 1988 .

[2]  Ieee Robotics,et al.  IEEE journal of robotics and automation , 1985 .

[3]  Luc Steels,et al.  Aibo''s first words. the social learning of language and meaning. Evolution of Communication , 2002 .

[4]  M. Ruffin On being digital. , 1995, Physician executive.

[5]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[6]  James F. Allen,et al.  Toward Conversational Human-Computer Interaction , 2001, AI Mag..

[7]  N. Iwahashi,et al.  A method for the coupling of belief systems through human-robot language interaction , 2003, The 12th IEEE International Workshop on Robot and Human Interactive Communication, 2003. Proceedings. ROMAN 2003..

[8]  M. Degroot Optimal Statistical Decisions , 1970 .

[9]  N. Iwahashi,et al.  Active and unsupervised learning for spoken word acquisition through a multimodal interface , 2004, RO-MAN 2004. 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759).

[10]  Naoto Iwahashi,et al.  Learning Abstract Concepts and Words from Perception Based on Bayesian Model Selection , 2006 .

[11]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[12]  M. Minami How Children Learn the Meanings of Words , 2001 .

[13]  Terry Winograd,et al.  Understanding natural language , 1974 .

[14]  M. Brent Advances in the computational study of language acquisition , 1996, Cognition.

[15]  M. Tomasello,et al.  Role Reversal Imitation and Language in Typically Developing Infants and Children With Autism , 2005 .

[16]  David R Traum,et al.  Towards a Computational Theory of Grounding in Natural Language Conversation , 1991 .

[17]  L. Steels Evolving grounded communication for robots , 2003, Trends in Cognitive Sciences.

[18]  Deb Roy,et al.  Integration of speech and vision using mutual information , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19]  Beno Benhabib,et al.  A complete generalized solution to the inverse kinematics of robots , 1985, IEEE J. Robotics Autom..

[20]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[21]  P. Dayan,et al.  Exploration bonuses and dual control , 1996 .

[22]  Catherine L. Harris,et al.  The human semantic potential: Spatial language and constrained connectionism , 1997 .

[23]  M. Tomasello,et al.  Unwilling versus unable: infants' understanding of intentional action. , 2005, Developmental psychology.

[24]  Michael Tomasello The Pragmatics of Word Learning , 1997 .

[25]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[26]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[27]  King-Sun Fu,et al.  Shape Discrimination Using Fourier Descriptors , 1977, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Sunny Shin,et al.  Do 15-Month-Old Infants Understand False Beliefs ? , 2005 .

[29]  Jun Tani,et al.  Learning Semantic Combinatoriality from the Interaction between Linguistic and Behavioral Processes , 2005, Adapt. Behav..

[30]  Naoto Iwahashi,et al.  Language acquisition through a human-robot interface , 2000, INTERSPEECH.

[31]  大西 仁,et al.  Pearl, J. (1988, second printing 1991). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kaufmann. , 1994 .

[32]  Peter Dayan,et al.  Exploration bonuses and dual control , 1996 .

[33]  David H. Wolpert,et al.  The Mathematics of Generalization: The Proceedings of the SFI/CNLS Workshop on Formal Approaches to Supervised Learning , 1994 .

[34]  D. Gentner,et al.  A cross-linguistic study of early word meaning: universal ontology and linguistic influence , 1997, Cognition.

[35]  H. Maturana Biology of Language: The Epistemology of Reality , 2014 .

[36]  Luc Steels,et al.  Grounding adaptive language games in robotic agents , 1997 .

[37]  A.L. Gorin,et al.  An experiment in spoken language acquisition , 1992, IEEE Trans. Speech Audio Process..

[38]  K. Tokuda,et al.  Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[39]  David H. Wolpert,et al.  The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework , 1995 .