暂无分享,去创建一个
[1] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[2] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[3] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[4] P. Tseng. Applications of splitting algorithm to decomposition in convex programming and variational inequalities , 1991 .
[5] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .
[6] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] Ronald J. Williams,et al. Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .
[9] Y. Censor,et al. A Note on the Behavior of the Randomized Kaczmarz Algorithm of Strohmer and Vershynin , 2009, The journal of fourier analysis and applications.
[10] B. Martinet,et al. R'egularisation d''in'equations variationnelles par approximations successives , 1970 .
[11] Dimitri P. Bertsekas,et al. Constrained Optimization and Lagrange Multiplier Methods , 1982 .
[12] Bruno Scherrer,et al. Performance bounds for λ policy iteration and application to the game of Tetris , 2013, J. Mach. Learn. Res..
[13] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[14] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[15] J. Halton. A Retrospective and Prospective Survey of the Monte Carlo Method , 1970 .
[16] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[17] Dimitri P. Bertsekas,et al. Lambda-Policy Iteration: A Review and a New Implementation , 2013, ArXiv.
[18] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..
[19] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..
[20] P. Lions,et al. Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .
[21] W. Dixon. Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2014 .
[22] John C. Duchi,et al. Stochastic Methods for Composite Optimization Problems , 2017 .
[23] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[24] Frank L. Lewis,et al. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .
[25] J. Curtiss. "Monte Carlo" Methods for the Iteration of Linear Operations , 2017 .
[26] Petros Drineas,et al. Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..
[27] D. Bertsekas,et al. Solution of Large Systems of Equations Using Approximate Dynamic Programming Methods , 2007 .
[28] S. Muthukrishnan,et al. Sampling algorithms for l2 regression and applications , 2006, SODA '06.
[29] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[30] R. A. Leibler,et al. Matrix inversion by a Monte Carlo method , 1950 .
[31] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[32] Adrian S. Lewis,et al. Randomized Methods for Linear Constraints: Convergence Rates and Conditioning , 2008, Math. Oper. Res..
[33] D. Bertsekas. On the method of multipliers for convex programming , 1975 .
[34] Dmitriy Drusvyatskiy,et al. Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods , 2016, Math. Oper. Res..
[35] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[36] S. Muthukrishnan,et al. Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..
[37] D. Bertsekas,et al. On the Convergence of Iterative Simulation-Based Methods for Singular Linear Systems , 2012 .
[38] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[39] Dimitri P. Bertsekas,et al. Convex Optimization Algorithms , 2015 .
[40] Frank L. Lewis,et al. Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2012 .
[41] Dimitri P. Bertsekas,et al. Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems , 2014, Math. Oper. Res..
[42] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[43] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[44] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[45] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[46] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[47] Marc Teboulle,et al. Gradient-based algorithms with applications to signal-recovery problems , 2010, Convex Optimization in Signal Processing and Communications.
[48] S. Muthukrishnan,et al. Faster least squares approximation , 2007, Numerische Mathematik.
[49] W. Wasow. A note on the inversion of matrices by random walks , 1952 .
[50] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[51] Christos Boutsidis,et al. Near Optimal Column-Based Matrix Reconstruction , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.
[52] Dimitri P. Bertsekas,et al. Q-learning and policy iteration algorithms for stochastic shortest path problems , 2012, Annals of Operations Research.
[53] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .
[54] Dimitri P. Bertsekas,et al. Incremental constraint projection methods for variational inequalities , 2014, Math. Program..
[55] Petros Drineas,et al. FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .
[56] Dimitri P. Bertsekas,et al. Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..
[57] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[58] T. E. S. Raghavan,et al. Algorithms for stochastic games — A survey , 1991, ZOR Methods Model. Oper. Res..
[59] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.
[60] R. Vershynin,et al. A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.
[61] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[62] Bruno Scherrer,et al. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris , 2013, NIPS.
[63] Dimitri P. Bertsekas,et al. On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..
[64] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[65] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[66] Benjamin Van Roy,et al. On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .
[67] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[68] D. Gabay. Applications of the method of multipliers to variational inequalities , 1983 .
[69] Stephen J. Wright,et al. A proximal method for composite minimization , 2008, Mathematical Programming.
[70] Dimitri P. Bertsekas,et al. Temporal Difference Methods for General Projected Equations , 2011, IEEE Transactions on Automatic Control.
[71] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.
[72] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[73] M. A. Krasnoselʹskii,et al. Approximate Solution of Operator Equations , 1972 .
[74] Bruno Scherrer,et al. Improvements on Learning Tetris with Cross Entropy , 2009, J. Int. Comput. Games Assoc..
[75] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[76] C. Fletcher. Computational Galerkin Methods , 1983 .