Autonomous mental development in high dimensional context and action spaces

Autonomous Mental Development (AMD) of robots opened a new paradigm for developing machine intelligence, using neural network type of techniques and it fundamentally changed the way an intelligent machine is developed from manual to autonomous. The work presented here is a part of SAIL (Self-Organizing Autonomous Incremental Learner) project which deals with autonomous development of humanoid robot with vision, audition, manipulation and locomotion. The major issue addressed here is the challenge of high dimensional action space (5-10) in addition to the high dimensional context space (hundreds to thousands and beyond), typically required by an AMD machine. This is the first work that studies a high dimensional (numeric) action space in conjunction with a high dimensional perception (context state) space, under the AMD mode. Two new learning algorithms, Direct Update on Direction Cosines (DUDC) and High-Dimensional Conjugate Gradient Search (HCGS), are developed, implemented and tested. The convergence properties of both the algorithms and their targeted applications are discussed. Autonomous learning of speech production under reinforcement learning is studied as an example.

[1]  J. Perkell,et al.  Invariance and variability in speech processes , 1987 .

[2]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3]  Juyang Weng,et al.  An Incremental Learning Algorithm with Automatically Derived Discriminating Features , 2000 .

[4]  Juyang Weng,et al.  Action chaining by a developmental robot with a value system , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.

[5]  Aude Billard,et al.  Learning human arm movements by imitation: : Evaluation of a biologically inspired connectionist architecture , 2000, Robotics Auton. Syst..

[6]  C. M. Reeves,et al.  Function minimization by conjugate gradients , 1964, Comput. J..

[7]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[8]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[9]  James L. McClelland,et al.  Autonomous Mental Development by Robots and Animals , 2001, Science.

[10]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[11]  Juyang Weng,et al.  Grounded auditory development by a developmental robot , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[12]  Stefan Schaal,et al.  Real Time Learning in Humanoids: A challenge for scalability of Online Algorithms , 2000 .

[13]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[14]  Douglas O'Shaughnessy Speech Communication , 1987 .

[15]  Kenneth N. Stevens,et al.  Constraints among param-eters simplify control of Klatt formant synthesizer , 1991 .

[16]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[17]  Juyang Weng,et al.  Hierarchical Discriminant Regression , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[19]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[20]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[21]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[22]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .