论文信息 - Economic Hierarchical Q-Learning

Economic Hierarchical Q-Learning

Hierarchical state decompositions address the curse-of-dimensionality in Q-learning methods for reinforcement learning (RL) but can suffer from suboptimality. In addressing this, we introduce the Economic Hierarchical Q-Learning (EHQ) algorithm for hierarchical RL. The EHQ algorithm uses subsidies to align interests such that agents that would otherwise converge to a recursively optimal policy will instead be motivated to act hierarchically optimally. The essential idea is that a parent will pay a child for the relative value to the rest of the system for "returning the world" in one state over another state. The resulting learning framework is simple compared to other algorithms that obtain hierarchical optimality. Additionally, EHQ encapsulates relevant information about value tradeoffs faced across the hierarchy at each node and requires minimal data exchange between nodes. We provide no theoretical proof of hierarchical optimality but are able demonstrate success with EHQ in empirical results.

David C. Parkes | Ruggiero Cavallo | Erik G. Schultink

[1] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[2] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[3] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[4] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[5] Igor Durdanovic,et al. Evolution of Cooperative Problem Solving in an Artificial Economy , 2000, Neural Computation.

[6] Thomas Dean,et al. Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[7] Thomas G. Dietterich. State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[8] Stuart J. Russell,et al. A compact, hierarchically optimal Q-function decomposition , 2006, UAI 2006.