Learning Sparse Representations in Reinforcement Learning with Sparse Coding

A variety of representation learning approaches have been investigated for reinforcement learning; much less attention, however, has been given to investigating the utility of sparse coding. Outside of reinforcement learning, sparse coding representations have been widely used, with non-convex objectives that result in discriminative representations. In this work, we develop a supervised sparse coding objective for policy evaluation. Despite the non-convexity of this objective, we prove that all local minima are global minima, making the approach amenable to simple optimization strategies. We empirically show that it is key to use a supervised objective, rather than the more straightforward unsupervised sparse coding approach. We compare the learned representations to a canonical fixed sparse representation, called tile-coding, demonstrating that the sparse coding representation outperforms a wide variety of tilecoding representations.

[1]  G. D. Maso,et al.  An Introduction to-convergence , 1993 .

[2]  Richard S. Sutton,et al.  Online Learning with Random Representations , 1993, ICML.

[3]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[4]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[5]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[6]  Jun Morimoto,et al.  Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach , 2002, NIPS.

[7]  Risto Miikkulainen,et al.  Efficient evolution of neural network topologies , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[8]  Rémi Coulom,et al.  Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[9]  Doina Precup,et al.  Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning , 2004, ECML.

[10]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[11]  Shie Mannor,et al.  Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[12]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[13]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[14]  S. Whiteson,et al.  Adaptive Tile Coding for Value Function Approximation , 2007 .

[15]  M. Loth,et al.  Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[16]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[17]  Lihong Li,et al.  An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[18]  Geoffrey J. Gordon,et al.  A Unified View of Matrix Factorization Models , 2008, ECML/PKDD.

[19]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[20]  Jean Ponce,et al.  Convex Sparse Matrix Factorizations , 2008, ArXiv.

[21]  Dimitri P. Bertsekas,et al.  Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[22]  Sridhar Mahadevan,et al.  Learning Representation and Control in Markov Decision Processes: New Frontiers , 2009, Found. Trends Mach. Learn..

[23]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[24]  C. J. Larsen Γ-CONVERGENCE FOR STABLE STATES AND LOCAL MINIMIZERS , 2009 .

[25]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[26]  Shalabh Bhatnagar,et al.  Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.

[27]  Bruno Scherrer,et al.  Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.

[28]  Shie Mannor,et al.  Adaptive bases for Q-learning , 2010, 49th IEEE Conference on Decision and Control (CDC).

[29]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[30]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[31]  Ronald Parr,et al.  Greedy Algorithms for Sparse Reinforcement Learning , 2012, ICML.

[32]  Haitham Bou-Ammar,et al.  Reinforcement learning transfer via sparse coding , 2012, AAMAS.

[33]  Richard S. Sutton,et al.  Representation Search through Generate and Test , 2013, AAAI Workshop: Learning Rich Representations from Low-Level Sensors.

[34]  Tze-Yun Leong,et al.  Online Feature Selection for Model-based Reinforcement Learning , 2013, ICML.

[35]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[36]  J. Gondzio,et al.  A Second-Order Method for Strongly Convex L1-Regularization Problems , 2013 .

[37]  Andrea Braides Local Minimization, Variational Evolution and Γ-Convergence , 2013 .

[38]  Sridhar Mahadevan,et al.  Basis Adaptation for Sparse Nonlinear Reinforcement Learning , 2013, AAAI.

[39]  Sanjeev Arora,et al.  Simple, Efficient, and Neural Algorithms for Sparse Coding , 2015, COLT.

[40]  J. Andrew Bagnell,et al.  Online Bellman Residual Algorithms with Predictive Error Guarantees , 2015, UAI.

[41]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[42]  Martha White,et al.  Global optimization of factor models using alternating minimization , 2016, ArXiv.

[43]  Prateek Jain,et al.  Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization , 2013, SIAM J. Optim..

[44]  Jacek Gondzio,et al.  A second-order method for strongly convex ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _1$$\end{document}-re , 2013, Mathematical Programming.