Extracting bottlenecks for reinforcement learning agent by holonic concept clustering and attentional functions

Attentional functions are introduced to extract bottlenecks indirectly.The results showed a considerable improvement in the precision of detection.It has better time complexity comparing to other methods.It needs fewer requirements for designer's help comparing to other methods. Reinforcement learning is not well scalable in state spaces with high-dimensions. The hierarchical reinforcement learning resolves this problem by task decomposition. Task decomposition is done by extracting bottlenecks, which is in turn another challenging issue, especially in terms of time and memory complexity and the need to the prior knowledge of the environment. To alleviate these issues, a new approach is proposed toward the problem of extracting bottlenecks. Holonic concept clustering and attentional functions are proposed to extract bottleneck states. To this end, states are organized based on the effects of actions by means of a holonic clustering to extract high-level concepts. High-level concepts are used as cues for controlling attention. The proposed mechanism has a better time complexity and fewer requirements to the designer's help. The experimental results showed a considerable improvement in the precision of bottleneck detection and agent's performance for traditional benchmarks comparing to other similar methods.

[1]  Douglas H. Norrie,et al.  Holonic self-organization of multi-agent systems by fuzzy modeling with application to intelligent manufacturing , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[2]  J. Kruschke,et al.  ALCOVE: an exemplar-based connectionist model of category learning. , 1992, Psychological review.

[3]  Paul Davidsson,et al.  Concept Acquisition by Autonomous Agents: Cognitive Modeling versus the Engineering Approach , 2007 .

[4]  Von-Wun Soo,et al.  Subgoal Identification for Reinforcement Learning and Planning in Multiagent Problem Solving , 2007, MATES.

[5]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[6]  Susan L. Epstein The role of memory and concepts in learning , 1992, Minds and Machines.

[7]  Vadim Bulitko,et al.  Focus of Attention in Reinforcement Learning , 2007 .

[8]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[9]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[10]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[11]  Christopher M. Brown,et al.  Selective Attention as Sequential Behavior: Modeling Eye Movements with an Augmented Hidden Markov Model , 1990 .

[12]  Hossein Mobahi,et al.  A BIOLOGICALLY INSPIRED METHOD FOR CONCEPTUAL IMITATION USING REINFORCEMENT LEARNING , 2007, Appl. Artif. Intell..

[13]  T. Zentall,et al.  Categorization, concept learning, and behavior analysis: an introduction. , 2002, Journal of the experimental analysis of behavior.

[14]  Von-Wun Soo,et al.  AUTOMATIC COMPLEXITY REDUCTION IN REINFORCEMENT LEARNING , 2010, Comput. Intell..

[15]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[16]  I.M.M. Salama Effects of action observation on brain activity, function and strength , 2011 .

[17]  Bruce L. Digney,et al.  Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[18]  Christian Balkenius,et al.  Attention as selection-for-action: a scheme for active perception , 1999, 1999 Third European Workshop on Advanced Mobile Robots (Eurobot'99). Proceedings (Cat. No.99EX355).

[19]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[20]  Nasser Mozayani,et al.  Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks , 2014, J. Intell. Fuzzy Syst..

[21]  M. Frank,et al.  Instructional control of reinforcement learning: A behavioral and neurocomputational investigation , 2009, Brain Research.

[22]  Fan-Tien Cheng,et al.  Development of holonic manufacturing execution systems , 2004, J. Intell. Manuf..

[23]  Von-Wun Soo,et al.  Subgoal Identifications in Reinforcement Learning: A Survey , 2011 .

[24]  Vicent J. Botti,et al.  Holons and agents , 2004, J. Intell. Manuf..

[25]  R. M. Young,et al.  Comparing Human Concept Acquisition to Models in a Cognitive Architecture , 2002 .

[26]  S. Kosslyn,et al.  Components of high-level vision: A cognitive neuroscience analysis and accounts of neurological syndromes , 1990, Cognition.

[27]  Chris Drummond,et al.  Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks , 2011, J. Artif. Intell. Res..

[28]  Michael L. Littman,et al.  Perception-based generalization in model-based reinforcement learning , 2009 .

[29]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[30]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[31]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[34]  Peter J. Burt,et al.  Smart sensing within a pyramid vision machine , 1988, Proc. IEEE.

[35]  Christof Koch,et al.  Attention in hierarchical models of object recognition. , 2007, Progress in brain research.

[36]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[37]  Sebastian Thrun,et al.  Learning Metric-Topological Maps for Indoor Mobile Robot Navigation , 1998, Artif. Intell..

[38]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[39]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[40]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[41]  Andrew G. Barto,et al.  Autonomous discovery of temporal abstractions from interaction with an environment , 2002 .

[42]  Mohammad Rahmati,et al.  Automatic abstraction in reinforcement learning using data mining techniques , 2009, Robotics Auton. Syst..

[43]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[44]  Fabian Canas,et al.  Attention and Reinforcement Learning: Constructing Representations from Indirect Feedback , 2010 .

[45]  Nasser Mozayani,et al.  Automatic abstraction controller in reinforcement learning agent via automata , 2014, Appl. Soft Comput..

[46]  Marie desJardins,et al.  Discovering Subgoals in Complex Domains , 2014, AAAI Fall Symposia.

[47]  Antonio A. F. Oliveira,et al.  Tracing Patterns and Attention: Humanoid Robot Cognition , 2000, IEEE Intell. Syst..

[48]  A Koestler,et al.  Ghost in the Machine , 1970 .

[49]  R. Bajcsy Active perception , 1988 .

[50]  S. Hochstein,et al.  Attentional control of early perceptual learning. , 1993, Proceedings of the National Academy of Sciences of the United States of America.