Simulated Annealing and the Generation of the Objective Function: A Model of Learning During Problem Solving

A computational model of problem solving based on significant aspects of human problem solving is introduced. It is observed that during problem solving humans often start searching more or less randomly, becoming more deterministic over time as they learn more about the problem. This two‐phase aspect of problem‐solving behavior and its relation to learning is one of the important features this model accounts for. The model uses an accelerated simulated annealing technique as a search mechanism within a real‐time dynamic programming‐like framework upon a connected graph of neighboring problem states. The objective value of each node is adjusted as the model moves between nodes, learning more accurate values for the nodes and also compensating for misleading heuristic information as it does so. In this manner the model is shown to learn to more effectively solve isomorphs of the Balls and Boxes and Tower of Hanoi problems. The major issues investigated with the model are (a) whether such a simulated annealing‐based model exhibits the kind of random‐to‐directed transition in behavior exhibited by people, and (b) whether the progressive discovery of the objective function, even when given very little or poor initial information, is a plausible method for representing the learning that occurs during problem solving and the knowledge that results from that learning.

[1]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[2]  S. Afriat The ring of linked rings , 1982 .

[3]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[4]  Garret N. Vanderplaats,et al.  Numerical Optimization Techniques for Engineering Design: With Applications , 1984 .

[5]  H. Simon,et al.  What makes some problems really hard: Explorations in the problem space of difficulty , 1990, Cognitive Psychology.

[6]  Alice M. Agogino,et al.  Techniques for integrating qualitative reasoning and symbolic computation in engineering optimization , 1987 .

[7]  Sebastian Thrun,et al.  Active Exploration in Dynamic Environments , 1991, NIPS.

[8]  Craig A. Knoblock Automatically generating abstractions for problem solving , 1991 .

[9]  Alice M. Agogino,et al.  Innovative design of mechanical structures from first principles , 1987, Artif. Intell. Eng. Des. Anal. Manuf..

[10]  B. Williams,et al.  ACTIVITY ANALYSIS: SIMPLIFYING OPTIMAL DESIGN PROBLEMS THROUGH QUALITATIVE PARTITIONING† , 1996 .

[11]  Richard Reviewer-Granger Unified Theories of Cognition , 1991, Journal of Cognitive Neuroscience.

[12]  D. Broadbent,et al.  Interactive tasks and the implicit‐explicit distinction , 1988 .

[13]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[14]  Allen Newell,et al.  Human Problem Solving. , 1973 .

[15]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[16]  Eamon P. Fulcher,et al.  Reinforcement Learning: On Being Wise During the Event , 1992 .

[17]  Edward A. Feigenbaum,et al.  The Art of Artificial Intelligence: Themes and Case Studies of Knowledge Engineering , 1977, IJCAI.

[18]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[19]  H. Simon,et al.  Why are some problems hard? Evidence from Tower of Hanoi , 1985, Cognitive Psychology.

[20]  Huang,et al.  AN EFFICIENT GENERAL COOLING SCHEDULE FOR SIMULATED ANNEALING , 1986 .

[21]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[22]  C. Buxton,et al.  The Psychology of Efficiency. , 1944 .