暂无分享,去创建一个
[1] Alessandro Lazaric,et al. LSTD with Random Projections , 2010, NIPS.
[2] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[3] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[4] Csaba Szepesvári,et al. Statistical linear estimation with penalized estimators: an application to reinforcement learning , 2012, ICML.
[5] Michael A. Saunders,et al. LSMR: An Iterative Algorithm for Sparse Least-Squares Problems , 2011, SIAM J. Sci. Comput..
[6] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[7] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[8] Vicente Hernández,et al. A robust and efficient parallel SVD solver based on restarted Lanczos bidiagonalization. , 2007 .
[9] Alborz Geramifard,et al. iLSTD: Eligibility Traces and Convergence Analysis , 2006, NIPS.
[10] L. Mirsky. SYMMETRIC GAUGE FUNCTIONS AND UNITARILY INVARIANT NORMS , 1960 .
[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[12] Gene H. Golub,et al. Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.
[13] Rich Sutton,et al. A Deeper Look at Planning as Learning from Replay , 2015, ICML.
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[16] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[17] M. Brand,et al. Fast low-rank modifications of the thin singular value decomposition , 2006 .
[18] R. Lathe. Phd by thesis , 1988, Nature.
[19] C. W. Groetsch,et al. The theory of Tikhonov regularization for Fredholm equations of the first kind , 1984 .
[20] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[21] Shie Mannor,et al. Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.
[22] Csaba Szepesvari,et al. Regularization in reinforcement learning , 2011 .
[23] C. Eckart,et al. The approximation of one matrix by another of lower rank , 1936 .
[24] Matthew W. Hoffman,et al. Finite-Sample Analysis of Lasso-TD , 2011, ICML.
[25] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[26] Nathan Srebro,et al. Stochastic optimization for PCA and PLS , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[27] Alborz Geramifard,et al. Sigma point policy iteration , 2008, AAMAS.
[28] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.
[29] P. Hansen. The discrete picard condition for discrete ill-posed problems , 1990 .
[30] Rémi Munos,et al. Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control , 2013, ECML/PKDD.
[31] Bruno Scherrer,et al. Rate of Convergence and Error Bounds for LSTD(λ) , 2014, ICML 2015.
[32] Richard S. Sutton,et al. True Online TD(lambda) , 2014, ICML.
[33] HansenPer Christian. The truncated SVD as a method for regularization , 1987 .
[34] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[35] Daniel F. Salas,et al. Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy Storage Problems , 2013 .