Finite-Sample Analysis of Lasso-TD

In this paper, we analyze the performance of Lasso-TD, a modification of LSTD in which the projection operator is defined as a Lasso problem. We first show that Lasso-TD is guaranteed to have a unique fixed point and its algorithmic implementation coincides with the recently presented LARS-TD and LC-TD methods. We then derive two bounds on the prediction error of Lasso-TD in the Markov design setting, i.e., when the performance is evaluated on the same states used by the method. The first bound makes no assumption, but has a slow rate w.r.t. the number of samples. The second bound is under an assumption on the empirical Gram matrix, called the compatibility condition, but has an improved rate and directly relates the prediction error to the sparsity of the value function in the feature space at hand.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  S. Mallat A wavelet tour of signal processing , 1998 .

[3]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[4]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[5]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[6]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[7]  M. Loth,et al.  Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[8]  Shie Mannor,et al.  Regularized Policy Iteration , 2008, NIPS.

[9]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[10]  Shie Mannor,et al.  Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.

[11]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[12]  Marek Petrik,et al.  Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.

[13]  P. Massart,et al.  An l1-Oracle Inequality for the Lasso , 2010, 1007.4791.

[14]  Ronald Parr,et al.  Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.

[15]  Alessandro Lazaric,et al.  Finite-Sample Analysis of LSTD , 2010, ICML.

[16]  Alessandro Lazaric,et al.  LSTD with Random Projections , 2010, NIPS.

[17]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .