Hierarchical State Abstraction Based on Structural Information Principles

State abstraction optimizes decision-making by ignoring irrelevant environmental information in reinforcement learning with rich observations. Nevertheless, recent approaches focus on adequate representational capacities resulting in essential information loss, affecting their performances on challenging tasks. In this article, we propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective. Specifically, an unsupervised, adaptive hierarchical state clustering method without requiring manual assistance is presented, and meanwhile, an optimal encoding tree is generated. On each non-root tree node, a new aggregation function and condition structural entropy are designed to achieve hierarchical state abstraction and compensate for sampling-induced essential information loss in state abstraction. Empirical evaluations on a visual gridworld domain and six continuous control benchmarks demonstrate that, compared with five SOTA state abstraction approaches, SISA significantly improves mean episode reward and sample efficiency up to 18.98 and 44.44%, respectively. Besides, we experimentally show that SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.

[1]  Angsheng Li,et al.  Effective and Stable Role-Based Multi-Agent Collaboration by Structural Information Principles , 2023, AAAI.

[2]  Philip S. Yu,et al.  SE-GSL: A General and Effective Graph Structure Learning Framework through Structural Entropy Optimization , 2023, WWW.

[3]  Jia Wu,et al.  Minimum Entropy Principle Guided Graph Neural Networks , 2023, WSDM.

[4]  Yifei Wang,et al.  USER: Unsupervised Structural Entropy-based Robust Graph Neural Network , 2023, Proceedings of the AAAI Conference on Artificial Intelligence.

[5]  Hao Peng,et al.  Reinforced, Incremental and Cross-Lingual Event Detection From Social Messages , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Philip S. Yu,et al.  Automating DBSCAN via Deep Reinforcement Learning , 2022, CIKM.

[7]  Zhengbang Zhu,et al.  Invariant Action Effect Model for Reinforcement Learning , 2022, AAAI.

[8]  Junran Wu,et al.  Structural Entropy Guided Graph Hierarchical Pooling , 2022, ICML.

[9]  Keyulu Xu,et al.  A Simple yet Effective Method for Graph Classification , 2022, IJCAI.

[10]  Xin Li,et al.  SimSR: Simple Distance-based State Representation for Deep Reinforcement Learning , 2021, AAAI.

[11]  Philip S. Yu,et al.  Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural Networks , 2021, ACM Trans. Inf. Syst..

[12]  Cameron Allen,et al.  Learning Markov State Abstractions for Deep Reinforcement Learning , 2021, NeurIPS.

[13]  S. Levine,et al.  Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[14]  Joelle Pineau,et al.  Improving Sample Efficiency in Model-Free Reinforcement Learning from Images , 2019, AAAI.

[15]  Yuval Tassa,et al.  dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.

[16]  P. Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[17]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[18]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[19]  Sergey Levine,et al.  Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.

[20]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[21]  Craig Boutilier,et al.  SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets , 2019, IJCAI.

[22]  Lawson L. S. Wong,et al.  State Abstraction as Compression in Apprenticeship Learning , 2019, AAAI.

[23]  Marc G. Bellemare,et al.  DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.

[24]  Zijian Zhang,et al.  REM: From Structural Entropy to Community Structure Deception , 2019, NeurIPS.

[25]  Zhihua Zhang,et al.  Decoding topologically associating domains with ultra-low resolution Hi-C data by graph structural entropy , 2018, Nature Communications.

[26]  Michael L. Littman,et al.  State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.

[27]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[28]  Marcus Hutter,et al.  Extreme state aggregation beyond Markov decision processes , 2016, Theor. Comput. Sci..

[29]  Michael L. Littman,et al.  Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.

[30]  Angsheng Li,et al.  Structural Information and Dynamical Complexity of Networks , 2016, IEEE Transactions on Information Theory.

[31]  Angsheng Li,et al.  Three-Dimensional Gene Map of Cancer Cell Types: Structural Entropy Minimisation Principle for Defining Tumour Subtypes , 2016, Scientific Reports.

[32]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[33]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34]  Victor R. Lesser,et al.  Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs , 2011, AAAI.

[35]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[36]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[37]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[38]  Russ Tedrake,et al.  Efficient Bipedal Robots Based on Passive-Dynamic Walkers , 2005, Science.

[39]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[40]  Thomas G. Dietterich State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[41]  R. Bellman A Markovian Decision Process , 1957 .

[42]  Claude E. Shannon,et al.  The lattice theory of information , 1953, Trans. IRE Prof. Group Inf. Theory.