Connectionist Models of Reinforcement, Imitation, and Instruction in Learning to Solve Complex Problems

We compared computational models and human performance on learning to solve a high-level, planning-intensive problem. Humans and models were subjected to three learning regimes: reinforcement, imitation, and instruction. We modeled learning by reinforcement (rewards) using SARSA, a softmax selection criterion and a neural network function approximator; learning by imitation using supervised learning in a neural network; and learning by instructions using a knowledge-based neural network. We had previously found that human participants who were told if their answers were correct or not (a reinforcement group) were less accurate than participants who watched demonstrations of successful solutions of the task (an imitation group) and participants who read instructions explaining how to solve the task. Furthermore, we had found that humans who learn by imitation and instructions performed more complex solution steps than those trained by reinforcement. Our models reproduced this pattern of results.

[1]  Richard Gonzalez,et al.  Computational Models for the Combination of Advice and Individual Learning , 2009, Cogn. Sci..

[2]  Thomas R. Shultz,et al.  A systematic comparison of flat and standard cascade-correlation using a student–teacher network approximation task , 2007, Connect. Sci..

[3]  F. Dandurand,et al.  Complex problem solving with reinforcement learning , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[4]  Thomas R. Shultz,et al.  Could Knowledge-Based Neural Learning be Useful in Developmental Robotics? The Case of Kbcc , 2007, Int. J. Humanoid Robotics.

[5]  T. Shultz,et al.  Strategies, Heuristics and Biases in Complex Problem Solving , 2007 .

[6]  John R. Anderson,et al.  From recurrent choice to skill learning: a reinforcement-learning model. , 2006, Journal of experimental psychology. General.

[7]  Raymond H. Cuijpers,et al.  Goals and means in action observation: A computational approach , 2006, Neural Networks.

[8]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[9]  John E. Laird,et al.  Soar-RL: integrating reinforcement learning with Soar , 2005, Cognitive Systems Research.

[10]  C. W. Tate Solve it. , 2005, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  John R Anderson,et al.  An integrated theory of the mind. , 2004, Psychological review.

[13]  J.-P. Thivierge,et al.  Transferring domain rules in a constructive network: introducing RBCC , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[14]  T. Shultz,et al.  Learning by Imitation, Reinforcement and Verbal Rules in Problem Solving Tasks , 2004 .

[15]  T. Shultz Computational Developmental Psychology , 2003 .

[16]  John R. Anderson,et al.  Problem solving: Increased planning with practice , 2003, Cognitive Systems Research.

[17]  N. Chater,et al.  Simplicity: a unifying principle in cognitive science? , 2003, Trends in Cognitive Sciences.

[18]  M. Tomasello,et al.  Understanding "prior intentions" enables two-year-olds to imitatively learn a complex task. , 2002, Child development.

[19]  C. Güzelı̇ş,et al.  Hopfield networks for solving Tower of Hanoi problems , 2001 .

[20]  Thomas R. Shultz,et al.  Knowledge-based cascade-correlation: Using knowledge to speed learning , 2001, Connect. Sci..

[21]  N. Cowan The magical number 4 in short-term memory: A reconsideration of mental storage capacity , 2001, Behavioral and Brain Sciences.

[22]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[23]  Andrew Tridgell,et al.  TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search , 1999, ArXiv.

[24]  Daniel S. Levine,et al.  Fundamentals of Neural Network Modeling: Neuropsychology and Cognitive Neuroscience , 1998 .

[25]  R. Parks,et al.  Parallel distributed processing and executive functioning: Tower of Hanoi neural networkmodel in healthy controls and left frontal lobe patients. , 1997, The International journal of neuroscience.

[26]  Lorenz Halbeisen,et al.  The general counterfeit coin problem , 1995, Discret. Math..

[27]  Garrison W. Cottrell,et al.  A Connectionist Model Of Instruction Following , 1995 .

[28]  Shumeet Baluja,et al.  Reducing Network Depth in the Cascade-Correlation Learning Architecture, , 1994 .

[29]  K. Holyoak,et al.  Mental Leaps: Analogy in Creative Thought , 1994 .

[30]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[31]  Dennis H. Holding,et al.  Theories of chess skill , 1992 .

[32]  Richard Reviewer-Granger Unified Theories of Cognition , 1991, Journal of Cognitive Neuroscience.

[33]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[34]  John H. Holland,et al.  Induction: Processes of Inference, Learning, and Discovery , 1987, IEEE Expert.

[35]  D. Holding The Psychology of Chess Skill , 1985 .

[36]  B Tversky,et al.  Force of symmetry in form perception. , 1984, The American journal of psychology.

[37]  Herbert A. Simon,et al.  The Structure of Ill Structured Problems , 1973, Artif. Intell..

[38]  Allen Newell,et al.  Human Problem Solving. , 1973 .

[39]  Allen Newell,et al.  The logic theory machine-A complex information processing system , 1956, IRE Trans. Inf. Theory.

[40]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[41]  M. Simmel The coin problem: a study in thinking. , 1953, The American journal of psychology.

[42]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[43]  George Katona,et al.  Organizing and memorizing , 1940 .