Reinforcement Learning in non-stationary environments: An intrinsically motivated stress based memory retrieval performance (SBMRP) model

Biological systems are said to learn from both intrinsic and extrinsic motivations. Extrinsic motivations, largely based on environmental conditions, have been well explored by Reinforcement Learning (RL) methods. Less explored, and more interesting in our opinion, are the possible intrinsic motivations that may drive a learning agent. In this paper we explore such a possibility. We develop a novel intrinsic motivation model which is based on the well known Yerkes and Dodson stress curve theory and the biological principles associated with stress. We use a stress feedback loop to affect the agent's memory capacity for retrieval. The stress and memory signals are fed into a fuzzy logic system which decides upon the best action for the agent to perform against the current best action policy. Our simulated results show that our model significantly improves upon agent learning performance and stability when objectively compared against existing state-of-the-art RL approaches in non-stationary environments and can effectively deal with significantly larger problem domains.

[1]  A. Clark Being There: Putting Brain, Body, and World Together Again , 1996 .

[2]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[3]  Peter Stone,et al.  TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.

[4]  Adam M. Campbell,et al.  The Temporal Dynamics Model of Emotional Memory Processing: A Synthesis on the Neurobiological Basis of Stress-Induced Amnesia, Flashbulb and Traumatic Memories, and the Yerkes-Dodson Law , 2007, Neural plasticity.

[5]  Harlow Hf Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950 .

[6]  J W Mason,et al.  A Review of Psychoendocrine Research on the Sympathetic‐Adrenal Medullary System , 1968, Psychosomatic medicine.

[7]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[8]  R. Yerkes,et al.  The relation of strength of stimulus to rapidity of habit‐formation , 1908 .

[9]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[10]  Bart De Schutter,et al.  Multiagent Reinforcement Learning with Adaptive State Focus , 2005, BNAIC.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[13]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[14]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[15]  S. Lupien,et al.  The effects of stress and stress hormones on human cognition: Implications for the field of brain and cognition , 2007, Brain and Cognition.

[16]  J. Wingfield,et al.  The concept of allostasis in biology and biomedicine , 2003, Hormones and Behavior.

[17]  Stafford L. Lightman,et al.  The HPA axis in major depression: classical theories and new developments , 2008, Trends in Neurosciences.