Linear Complementarity for Regularized Policy Evaluation and Improvement

Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overfitting. We propose formulating the L1 regularized linear fixed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over the LARS-inspired formulation, LARS-TD. The LCP formulation allows the use of efficient off-the-shelf solvers, leads to a new uniqueness result, and can be initialized with starting points from similar problems (warm starts). We demonstrate that warm starts, as well as the efficiency of LCP solvers, can speed up policy iteration. Moreover, warm starts permit a form of modified policy iteration that canbeusedto approximate a "greedy" homotopy path, a generalization of the LARS-TD homotopy path that combines policy evaluation and optimization.

[1]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[2]  S. Mahadevan,et al.  Sparse Approximate Policy Evaluation using Graph-based Basis Functions , 2009 .

[3]  M. Puterman,et al.  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[4]  Joaquim Júdice,et al.  A block principal pivoting algorithm for large-scale strictly monotone linear complementarity problems , 1994, Comput. Oper. Res..

[5]  Marek Petrik,et al.  Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.

[6]  SANG-GU LEE A SURVEY ON THE MATRIX COMPLETION PROBLEM , 2001 .

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  F. M. Pires,et al.  Basic-set algorithm for a generalized linear complementarity problem , 1992 .

[9]  Haesun Park,et al.  Fast Active-set-type Algorithms for L1-regularized Linear Regression , 2010, AISTATS.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[12]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[13]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[14]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[15]  Katta G. Murty,et al.  Linear complementarity, linear and nonlinear programming , 1988 .

[16]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[17]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[18]  Andrew G. Barto,et al.  Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.

[19]  M. Loth,et al.  Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  Shie Mannor,et al.  Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.

[22]  Lihong Li,et al.  An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.