论文信息 - Adaptive Tile Coding for Value Function Approximation

Adaptive Tile Coding for Value Function Approximation

Reinforcement learning problems are commonly tackled by estimating the optimal value function. In many real-world problems, learning this value function requires a function approximator, which maps states to values via a parameterized function. In practice, the success of function approximators depends on the ability of the human designer to select an appropriate representation for the value function. This paper presents adaptive tile coding, a novel method that automates this design process for tile coding, a popular function approximator, by beginning with a simple representation with few tiles and refining it during learning by splitting existing tiles into smaller ones. In addition to automatically discovering effective representations, this approach provides a natural way to reduce the function approximator’s level of generalization over time. Empirical results in multiple domains compare two different criteria for deciding which tiles to split and verify that adaptive tile coding can automatically discover effective representations and that its speed of learning is competitive with the best fixed representations.

S. Whiteson | Shimon Whiteson

[1] James S. Albus,et al. Brains, behavior, and robotics , 1981 .

[2] J. Tsitsiklis,et al. An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .

[3] C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .

[4] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[5] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[6] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[7] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning , 1995 .

[8] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[9] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[10] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[11] Doina Precup,et al. Combining TD-learning with Cascade-correlation Networks , 2003, ICML.