论文信息 - Learning Sparse Representations in Reinforcement Learning with Sparse Coding

Learning Sparse Representations in Reinforcement Learning with Sparse Coding

A variety of representation learning approaches have been investigated for reinforcement learning; much less attention, however, has been given to investigating the utility of sparse coding. Outside of reinforcement learning, sparse coding representations have been widely used, with non-convex objectives that result in discriminative representations. In this work, we develop a supervised sparse coding objective for policy evaluation. Despite the non-convexity of this objective, we prove that all local minima are global minima, making the approach amenable to simple optimization strategies. We empirically show that it is key to use a supervised objective, rather than the more straightforward unsupervised sparse coding approach. We compare the learned representations to a canonical fixed sparse representation, called tile-coding, demonstrating that the sparse coding representation outperforms a wide variety of tilecoding representations.

[1] G. D. Maso,et al. An Introduction to-convergence , 1993 .

[2] Richard S. Sutton,et al. Online Learning with Random Representations , 1993, ICML.

[3] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[4] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[5] David J. Field,et al. Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[6] Jun Morimoto,et al. Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach , 2002, NIPS.

[7] Risto Miikkulainen,et al. Efficient evolution of neural network topologies , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[8] Rémi Coulom,et al. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[9] Doina Precup,et al. Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning , 2004, ECML.

[10] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[11] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[12] A. Bruckstein,et al. K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[13] M. Elad,et al. $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[14] S. Whiteson,et al. Adaptive Tile Coding for Value Function Approximation , 2007 .

[15] M. Loth,et al. Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[16] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[17] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[18] Geoffrey J. Gordon,et al. A Unified View of Matrix Factorization Models , 2008, ECML/PKDD.

[19] Guillermo Sapiro,et al. Supervised Dictionary Learning , 2008, NIPS.

[20] Jean Ponce,et al. Convex Sparse Matrix Factorizations , 2008, ArXiv.

[21] Dimitri P. Bertsekas,et al. Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[22] Sridhar Mahadevan,et al. Learning Representation and Control in Markov Decision Processes: New Frontiers , 2009, Found. Trends Mach. Learn..

[23] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[24] C. J. Larsen. Γ-CONVERGENCE FOR STABLE STATES AND LOCAL MINIMIZERS , 2009 .

[25] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[26] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.

[27] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.