Compressive Reinforcement Learning with Oblique Random Projections

Compressive sensing has been rapidly growing as a non-adaptive dimensionality reduction framework, wherein high-dimensional data is projected onto a randomly generated subspace. In this paper we explore a paradigm called compressive reinforcement learning, where approximately optimal policies are computed in a lowdimensional subspace generated from a high-dimensional feature space through random projections. We use the framework of oblique projections that unifies two popular methods to approximately solve MDPs – fixed point (FP) and Bellman residual (BR) methods, and derive error bounds on the quality of approximations obtained from combining random projections and oblique projections on a finite set of samples. We investigate the effectiveness of fixed point, Bellman residual, as well as hybrid least-squares methods in feature spaces generated by random projections. Finally, we present simulation results in various continuous MDPs, which show both gains in computation time and effectiveness in problems with large feature spaces and small sample sets.

[1]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[2]  Avrim Blum,et al.  Random Projection, Margins, Kernels, and Feature-Selection , 2005, SLSFS.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[5]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[6]  Dimitri P. Bertsekas,et al.  New error bounds for approximations from projected linear equations , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[7]  Gavin Taylor,et al.  Kernelized value function approximation for reinforcement learning , 2009, ICML '09.

[8]  Rémi Munos,et al.  Compressed Least-Squares Regression , 2009, NIPS.

[9]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[10]  Marek Petrik,et al.  Hybrid least-squares algorithms for approximate policy evaluation , 2009, Machine Learning.

[11]  Bruno Scherrer,et al.  Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.

[12]  Alessandro Lazaric,et al.  Finite-Sample Analysis of LSTD , 2010, ICML.

[13]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[14]  Alessandro Lazaric,et al.  LSTD with Random Projections , 2010, NIPS.