Statistical linear estimation with penalized estimators: an application to reinforcement learning

Motivated by value function estimation in reinforcement learning, we study statistical linear inverse problems, i.e., problems where the coefficients of a linear system to be solved are observed in noise. We consider penalized estimators, where performance is evaluated using a matrix-weighted two-norm of the defect of the estimator measured with respect to the true, unknown coefficients. Two objective functions are considered depending whether the error of the defect measured with respect to the noisy coefficients is squared or unsquared. We propose simple, yet novel and theoretically well-founded data-dependent choices for the regularization parameters for both cases that avoid data-splitting. A distinguishing feature of our analysis is that we derive deterministic error bounds in terms of the error of the coefficients, thus allowing the complete separation of the analysis of the stochastic properties of these errors. We show that our results lead to new insights and bounds for linear value function estimation in reinforcement learning.

[1]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[2]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3]  A. Kirsch An Introduction to the Mathematical Theory of Inverse Problems , 1996, Applied Mathematical Sciences.

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Paul-Marie Samson,et al.  Concentration of measure inequalities for Markov chains and $\Phi$-mixing processes , 2000 .

[7]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[8]  L. Rosasco Regularization approaches in learning theory , 2006 .

[9]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[12]  Csaba Szepesvári,et al.  Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.

[13]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[14]  Shie Mannor,et al.  Regularized Policy Iteration , 2008, NIPS.

[15]  Rémi Munos,et al.  Compressed Least-Squares Regression , 2009, NIPS.

[16]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[17]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[18]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[19]  Bruno Scherrer,et al.  Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.

[20]  Dimitri P. Bertsekas,et al.  Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..

[21]  Alessandro Lazaric,et al.  Finite-Sample Analysis of LSTD , 2010, ICML.

[22]  Alessandro Lazaric,et al.  LSTD with Random Projections , 2010, NIPS.

[23]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[24]  Odalric-Ambrym Maillard,et al.  (APPRENTISSAGE SÉQUENTIEL : Bandits, Statistique et Renforcement , 2011 .

[25]  Matthew W. Hoffman,et al.  Finite-Sample Analysis of Lasso-TD , 2011, ICML.

[26]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[27]  Gilles Stoltz,et al.  Inverse problems and high dimensional estimation , 2011 .

[28]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[29]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2011 .

[30]  Csaba Szepesvari,et al.  Regularized least-squares regression: Learning from a β-mixing sequence , 2012 .

[31]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.