Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts

Even in the absence of external reward, babies and scientists and others explore their world. Using some sort of adaptive predictive world model, they improve their ability to answer questions such as what happens if I do this or that? They lose interest in both the predictable things and those predicted to remain unpredictable despite some effort. One can design curious robots that do the same. The author’s basic idea (1990, 1991) for doing so is a reinforcement learning (RL) controller is rewarded for action sequences that improve the predictor. Here, this idea is revisited in the context of recent results on optimal predictors and optimal RL machines. Several new variants of the basic principle are proposed. Finally, it is pointed out how the fine arts can be formally understood as a consequence of the principle: given some subjective observer, great works of art and music yield observation histories exhibiting more novel, previously unknown compressibility/regularity/predictability (with respect to the observer’s particular learning algorithm) than lesser works, thus deepening the observer’s understanding of the world and what is possible in it.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  K. Gödel Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I , 1931 .

[3]  K. Gödel Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I , 1931 .

[4]  A. Turing On computable numbers, with an application to the Entscheidungsproblem , 1937, Proc. London Math. Soc..

[5]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[6]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[7]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[8]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[9]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[10]  Murray S. Davis,et al.  That's Interesting! , 1971 .

[11]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[12]  Ray J. Solomonoff,et al.  Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[13]  Sandy Lovie How the mind works , 1980, Nature.

[14]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[15]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[16]  Jürgen Schmidhuber,et al.  An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[17]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[18]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[19]  Jürgen Schmidhuber,et al.  Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[20]  Jenq-Neng Hwang,et al.  Query-based learning applied to partially trained multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[21]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[22]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[23]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[24]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[25]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[26]  Garrison W. Cottrell,et al.  Learning Mackey-Glass from 25 Examples, Plus or Minus 2 , 1993, NIPS.

[27]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[28]  S. Hochreiter,et al.  REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[29]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[30]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[31]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[32]  Jürgen Schmidhuber,et al.  Sequential neural text compression , 1996, IEEE Trans. Neural Networks.

[33]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[34]  Jürgen Schmidhuber,et al.  Low-Complexity Art , 2017 .

[35]  William I. Gasarch,et al.  Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[36]  J. Schmidhuber Facial beauty and fractal geometry , 1998 .

[37]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[38]  Ofi rNw8x'pyzm,et al.  The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions , 2002 .

[39]  Jürgen Schmidhuber,et al.  Hierarchies of Generalized Kolmogorov Complexities and Nonenumerable Universal Measures Computable in the Limit , 2002, Int. J. Found. Comput. Sci..

[40]  Jürgen Schmidhuber,et al.  Goedel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements , 2003, ArXiv.

[41]  Daniel Kudenko,et al.  Adaptive Agents and Multi-Agent Systems , 2003, Lecture Notes in Computer Science.

[42]  Jürgen Schmidhuber,et al.  Exploring the predictable , 2003 .

[43]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[44]  M. Balter Seeking the Key to Music , 2004, Science.

[45]  Jürgen Schmidhuber,et al.  Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[46]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[47]  Jürgen Schmidhuber,et al.  Gödel Machines: Towards a Technical Justification of Consciousness , 2005, Adaptive Agents and Multi-Agent Systems.

[48]  Dr. Marcus Hutter,et al.  Universal artificial intelligence , 2004 .

[49]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[50]  Jürgen Schmidhuber,et al.  Completely Self-referential Optimal Reinforcement Learners , 2005, ICANN.

[51]  Erkki Oja,et al.  Artificial Neural Networks: Biological Inspirations - ICANN 2005, 15th International Conference, Warsaw, Poland, September 11-15, 2005, Proceedings, Part I , 2005, ICANN.

[52]  Risto Miikkulainen,et al.  Developing navigation behavior through self-organizing distinctive-state abstraction , 2006, Connect. Sci..

[53]  Benjamin Kuipers,et al.  Bootstrap learning of foundational representations , 2006, Connect. Sci..

[54]  Douglas S. Blank,et al.  Introduction to developmental robotics , 2006, Connect. Sci..

[55]  Peter Stone,et al.  Towards autonomous sensor and actuator model induction on a mobile robot , 2006, Connect. Sci..

[56]  Pierre-Yves Oudeyer,et al.  The Discovery of Communication , 2006 .

[57]  Matthew Schlesinger,et al.  Decomposing infants’ object representations: A dual-route processing account , 2006, Connect. Sci..

[58]  Chrystopher L. Nehaniv,et al.  From unknown sensors and actuators to actions grounded in sensorimotor perceptions , 2006, Connect. Sci..

[59]  Brian Scassellati,et al.  Learning acceptable windows of contingency , 2006, Connect. Sci..

[60]  P. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications, Third Edition , 1997, Texts in Computer Science.