Including cognitive biases and distance-based rewards in a connectionist model of complex problem solving

We present a cognitive, connectionist-based model of complex problem solving that integrates cognitive biases and distance-based and environmental rewards under a temporal-difference learning mechanism. The model is tested against experimental data obtained in a well-defined and planning-intensive problem. We show that incorporating cognitive biases (symmetry and simplicity) in a temporal-difference learning rule (SARSA) increases model adequacy-the solution space explored by biased models better fits observed human solutions. While learning from explicit rewards alone is intrinsically slow, adding distance-based rewards, a measure of closeness to goal, to the learning rule significantly accelerates learning. Finally, the model correctly predicts that explicit rewards have little impact on problem solvers' ability to discover optimal solutions.

[1]  M. Tarr,et al.  Mental rotation and orientation-dependence in shape recognition , 1989, Cognitive Psychology.

[2]  Kathy Sweeney,et al.  Conceptual Blockbusting: A Guide to Better Ideas , 2005 .

[3]  John E. Laird,et al.  Soar-RL: integrating reinforcement learning with Soar , 2005, Cognitive Systems Research.

[4]  Keith J. Holyoak,et al.  Problem solving , 1990 .

[5]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[6]  Eugene Fink,et al.  Integrating planning and learning: the PRODIGY architecture , 1995, J. Exp. Theor. Artif. Intell..

[7]  Nick Chater,et al.  A simplicity principle in unsupervised human categorization , 2002, Cogn. Sci..

[8]  B Tversky,et al.  Force of symmetry in form perception. , 1984, The American journal of psychology.

[9]  Frank J. Lee,et al.  Production Compilation: A Simple Mechanism to Model Complex Skill Acquisition , 2003, Hum. Factors.

[10]  N. Chater,et al.  Simplicity: a unifying principle in cognitive science? , 2003, Trends in Cognitive Sciences.

[11]  D. Gentner,et al.  On Mental Leaps: Analogy in Creative Thought (Keith J. Holyoak and Paul Thagard) , 1996 .

[12]  A. Luchins Mechanization in problem solving: The effect of Einstellung. , 1942 .

[13]  Jonathan D. Cohen,et al.  Prefrontal cortex and flexible cognitive control: rules without symbols. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[15]  J. Feldman,et al.  Bayes and the Simplicity Principle in Perception Simplicity versus Likelihood Principles in Perception , 2022 .

[16]  K. Holyoak,et al.  Mental Leaps: Analogy in Creative Thought , 1994 .

[17]  Lorenz Halbeisen,et al.  The general counterfeit coin problem , 1995, Discret. Math..

[18]  Niels Taatgen,et al.  Proceedings of the 12th International Conference on Cognitive Modeling , 2004, ICCM 2013.

[19]  John A. Michon,et al.  Soar: A Cognitive Architecture in Perspective , 1992 .

[20]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[21]  Ron Sun,et al.  Learning to plan probabilistically from neural networks , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[22]  T. Shultz,et al.  Strategies, Heuristics and Biases in Complex Problem Solving , 2007 .

[23]  Yoshio Takane,et al.  Rule following and rule use in the balance-scale task , 2007, Cognition.

[24]  P. Todd,et al.  Simple Heuristics That Make Us Smart , 1999 .

[25]  Harlan D. Mills,et al.  Coin Weighing Problems, On , 1964 .

[26]  Steven Minton,et al.  Machine Learning Methods for Planning , 1994 .

[27]  H. B. Barlow,et al.  Finding Minimum Entropy Codes , 1989, Neural Computation.

[28]  Ron Sun,et al.  Learning, action and consciousness: a hybrid approach toward modelling consciousness , 1997, Neural Networks.

[29]  J. Gerard Wolff,et al.  Language acquisition, data compression and generalization , 1982 .

[30]  Gavin Adrian Rummery Problem solving with reinforcement learning , 1995 .

[31]  Sachiyo Arai,et al.  Guiding Inference Through Relational Reinforcement Learning , 2005, ILP.

[32]  Ron Sun,et al.  Learning Plans without a priori Knowledge , 2000, Adapt. Behav..

[33]  Reinaldo A. C. Bianchi,et al.  Accelerating autonomous learning by using heuristic selection of actions , 2008, J. Heuristics.

[34]  P. Langley,et al.  Acquisition of hierarchical reactive skills in a unified cognitive architecture , 2009, Cognitive Systems Research.

[35]  C R Olson,et al.  Mirror-image confusion in single neurons of the macaque inferotemporal cortex. , 2000, Science.

[36]  Gianluca Baldassarre,et al.  Planning with neural networks and reinforcement learning , 2001 .

[37]  Pat Langley,et al.  A Unified Framework for Planning and Learning , 1993 .

[38]  S. Maier,et al.  CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE 227 The Simplicity Principle in Human Concept Learning , 2022 .

[39]  Andrew Tridgell,et al.  TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search , 1999, ArXiv.

[40]  Daniel S. Levine,et al.  Fundamentals of Neural Network Modeling: Neuropsychology and Cognitive Neuroscience , 1998 .

[41]  T. Shultz Computational Developmental Psychology , 2003 .

[42]  C. Güzelı̇ş,et al.  Hopfield networks for solving Tower of Hanoi problems , 2001 .

[43]  W. Chase,et al.  Visual information processing. , 1974 .

[44]  Aladin Akyürek Means-Ends Planning: An Example Soar System , 1992 .

[45]  M. Corballis,et al.  The Psychology of Left and Right , 2020 .

[46]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[47]  Robert L. Goldstone,et al.  ' s personal copy Simplicity and generalization : Short-cutting abstraction in children ’ s object categorizations , 1997 .

[48]  T. Shultz,et al.  Why let networks grow , 2007 .

[49]  Stanislas Dehaene,et al.  Why do children make mirror errors in reading? Neural correlates of mirror invariance in the visual word form area , 2010, NeuroImage.

[50]  Nick Chater,et al.  From Universal Laws of Cognition to Specific Cognitive Models Candidate Principles 1: Scale Invariance Candidate Law 2: the Simplicity Principle , 2008 .

[51]  John H. Holland,et al.  Induction: Processes of Inference, Learning, and Discovery , 1987, IEEE Expert.

[52]  C. W. Tate Solve it. , 2005, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[53]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[54]  N. Daw,et al.  Reinforcement learning and higher level cognition: Introduction to special issue , 2009, Cognition.

[55]  M. Zorzi,et al.  Visuospatial planning in the travelling salesperson problem: A connectionist account of normal and impaired performance , 2008, Cognitive neuropsychology.

[56]  Faruk Polat,et al.  Learning Sequences of Compatible Actions Among Agents , 2002, Artificial Intelligence Review.

[57]  T. Shultz Assessing Generalization in Connectionist and Rule-Based Models Under the Learning Constraint , 2001 .

[58]  E. Feigenbaum,et al.  Computers and Thought , 1963 .

[59]  H. Simon,et al.  Models of Man. , 1957 .

[60]  Doina Precup,et al.  Combining TD-learning with Cascade-correlation Networks , 2003, ICML.

[61]  M. Simmel The coin problem: a study in thinking. , 1953, The American journal of psychology.

[62]  Risto Miikkulainen,et al.  Developing navigation behavior through self-organizing distinctive-state abstraction , 2006, Connect. Sci..

[63]  Zygmunt Pizlo,et al.  3D Shape - Its Unique Place in Visual Perception , 2008 .

[64]  Sebastian Thrun,et al.  Learning to Play the Game of Chess , 1994, NIPS.

[65]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[66]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[67]  I. Biederman,et al.  Evidence for Complete Translational and Reflectional Invariance in Visual Object Priming , 1991, Perception.

[68]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[69]  R. Shillcock,et al.  Proceedings of the Twenty-Third Annual Conference of the Cognitive Science Society , 2001 .

[70]  N. Logothetis,et al.  Psychophysical and physiological evidence for viewer-centered object representations in the primate. , 1995, Cerebral cortex.

[71]  M. Corballis,et al.  Confusion of Mirror Images by Pigeons and Interhemispheric Commissures , 1972, Nature.

[72]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[73]  P. Kline Models of man , 1986, Nature.

[74]  M. Scheerer,et al.  Problem Solving , 1967, Nature.

[75]  Nicolas P. Rougier,et al.  Learning representations in a gated prefrontal cortex model of dynamic task switching , 2002, Cogn. Sci..

[76]  A. V. Napalkov,et al.  COMPUTERS AND THOUGHT, EDITED BY E. A. FEIGENBAUM AND J. FELDMAN, NEW YORK, MCGRAW-HILL, 1963: BOOK REVIEW, , 1967 .

[77]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[78]  Richard Reviewer-Granger Unified Theories of Cognition , 1991, Journal of Cognitive Neuroscience.

[79]  Allen Newell,et al.  GPS, a program that simulates human thought , 1995 .

[80]  Stuart R. Butler,et al.  The effects of visual cortex lesions on the perception of rotated shapes , 1996, Behavioural Brain Research.

[81]  Frédéric Dandurand,et al.  Comparing online and lab methods in a problem-solving experiment , 2008, Behavior research methods.

[82]  Raymond S. Nickerson,et al.  Attention and Performance Viii , 2014 .

[83]  S. Ian Robertson,et al.  Problem-solving , 2001, Human Thinking.

[84]  John R. Anderson,et al.  From recurrent choice to skill learning: a reinforcement-learning model. , 2006, Journal of experimental psychology. General.

[85]  Thomas R. Shultz,et al.  Connectionist Models of Reinforcement, Imitation, and Instruction in Learning to Solve Complex Problems , 2009, IEEE Transactions on Autonomous Mental Development.

[86]  T. Shultz,et al.  Learning by Imitation, Reinforcement and Verbal Rules in Problem Solving Tasks , 2004 .

[87]  Irving Biederman,et al.  Invariance of long-term visual priming to scale, reflection, translation, and hemisphere , 2001, Vision Research.

[88]  I. J. Myung,et al.  An Adaptive Approach to Human Decision Making : Learning Theory , Decision Theory , and Human Performance , 2004 .

[89]  J. Rieskamp,et al.  SSL: a theory of how people learn to select strategies. , 2006, Journal of experimental psychology. General.

[90]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .