论文信息 - Batch-iFDD for Representation Expansion in Large MDPs - 字舞流文

Batch-iFDD for Representation Expansion in Large MDPs

Matching pursuit (MP) methods are a promising class of feature construction algorithms for value function approximation. Yet existing MP methods require creating a pool of potential features, mandating expert knowledge or enumeration of a large feature pool, both of which hinder scalability. This paper introduces batch incremental feature dependency discovery (Batch-iFDD) as an MP method that inherits a provable convergence property. Additionally, Batch-iFDD does not require a large pool of features, leading to lower computational complexity. Empirical policy evaluation results across three domains with up to one million states highlight the scalability of Batch-iFDD over the previous state of the art MP algorithm.

Alborz Geramifard | Jonathan P. How | Thomas J. Walsh | Nicholas Roy

[1] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[2] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[3] Richard S. Sutton,et al. Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[4] Doina Precup,et al. Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning , 2004, ECML.

[5] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.

[6] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[7] Peter Stone,et al. Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[8] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[9] Alborz Geramifard,et al. Online Discovery of Feature Dependencies , 2011, ICML.

[10] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[11] S. Whiteson,et al. Adaptive Tile Coding for Value Function Approximation , 2007 .

[12] Stephen Lin,et al. Evolutionary Tile Coding: An Automated State Abstraction Algorithm for Reinforcement Learning , 2010, Abstraction, Reformulation, and Approximation.

[13] Nathan R. Sturtevant,et al. Feature Construction for Reinforcement Learning in Hearts , 2006, Computers and Games.

[14] Shie Mannor,et al. Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[15] Ronald Parr,et al. Greedy Algorithms for Sparse Reinforcement Learning , 2012, ICML.

[16] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[17] Thomas L. Griffiths,et al. Compositionality in rational analysis: grammar-based induction for concept learning , 2008 .

[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.