A First-Order Approach to Accelerated Value Iteration
暂无分享,去创建一个
[1] Joelle Pineau,et al. Temporal Regularization in Markov Decision Process , 2018, ArXiv.
[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[3] W. Beyn,et al. Stability and paracontractivity of discrete linear inclusions , 2000 .
[4] R. Jungers. The Joint Spectral Radius: Theory and Applications , 2009 .
[5] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .
[6] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[7] Carl D. Meyer,et al. Matrix Analysis and Applied Linear Algebra , 2000 .
[8] Randy Cogill,et al. Reversible Markov Decision Processes with an Average-Reward Criterion , 2013, SIAM J. Control. Optim..
[9] Emmanuel J. Candès,et al. Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.
[10] Yinyu Ye,et al. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2011, Math. Oper. Res..
[11] C. E. Chidume,et al. Geometric Properties of Banach Spaces and Nonlinear Iterations , 2009 .
[12] Euhanna Ghadimi,et al. Global convergence of the Heavy-ball method for convex optimization , 2014, 2015 European Control Conference (ECC).
[13] P. R. Kumar,et al. Performance bounds for queueing networks and scheduling policies , 1994, IEEE Trans. Autom. Control..
[14] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[15] P. Tetali. Random walks and the effective resistance of networks , 1991 .
[16] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[17] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.
[18] P. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .
[19] Stephen P. Boyd,et al. Globally Convergent Type-I Anderson Acceleration for Nonsmooth Fixed-Point Iterations , 2018, SIAM J. Optim..
[20] Matthieu Geist,et al. Anderson Acceleration for Reinforcement Learning , 2018, EWRL 2018.
[21] David J. Aldous,et al. Lower bounds for covering times for reversible Markov chains and random walks on graphs , 1989 .
[22] R. Cogill,et al. Suboptimality Bounds in Stochastic Control: A Queueing Example , 2006, 2006 American Control Conference.
[23] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[24] John N. Tsitsiklis,et al. The Lyapunov exponent and joint spectral radius of pairs of matrices are hard—when not impossible—to compute and to approximate , 1997, Math. Control. Signals Syst..
[25] Richard G. Baraniuk,et al. Fast Alternating Direction Optimization Methods , 2014, SIAM J. Imaging Sci..
[26] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.
[27] David Barber,et al. Approximate Newton Methods for Policy Search in Markov Decision Processes , 2016, J. Mach. Learn. Res..
[28] Peter G. Doyle,et al. Random Walks and Electric Networks: REFERENCES , 1987 .
[29] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[30] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[31] Alessandro Lazaric,et al. Active Exploration in Markov Decision Processes , 2019, AISTATS.
[32] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[33] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[34] Harold J. Kushner,et al. Accelerated procedures for the solution of discrete Markov control problems , 1971 .
[35] Uri Yechiali,et al. A K-step look-ahead analysis of value iteration algorithms for Markov decision processes , 1996 .
[36] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..
[37] Chi-Guhn Lee,et al. Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes , 2010, Oper. Res..
[38] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[39] Vincent D. Blondel,et al. Computationally Efficient Approximations of the Joint Spectral Radius , 2005, SIAM J. Matrix Anal. Appl..
[40] Matthieu Geist,et al. Softened Approximate Policy Iteration for Markov Games , 2016, ICML.
[41] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[42] Evan L. Porteus,et al. Technical Note - Accelerated Computation of the Expected Discounted Return in a Markov Chain , 1978, Oper. Res..
[43] Stéphane Gaubert,et al. The Operator Approach to Entropy Games , 2017, Theory of Computing Systems.
[44] Shie Mannor,et al. Robust MDPs with k-Rectangular Uncertainty , 2016, Math. Oper. Res..
[45] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .
[46] Sean P. Meyn,et al. Duality and linear programs for stability and performance analysis of queuing networks and scheduling policies , 1996, IEEE Trans. Autom. Control..
[47] Martin L. Puterman,et al. On the Convergence of Policy Iteration in Stationary Dynamic Programming , 1979, Math. Oper. Res..
[48] J. Filar,et al. On the Algorithm of Pollatschek and Avi-ltzhak , 1991 .
[49] U. Yechiali,et al. Accelerating Procedures of the Value Iteration Algorithm for Discounted Markov Decision Processes, Based on a One-Step Lookahead Analysis , 1994 .
[50] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[51] Vladimir Yu. Protasov,et al. The greedy strategy for optimizing the Perron eigenvalue , 2018, Mathematical Programming.
[52] R. E. Kalman,et al. Controllability of linear dynamical systems , 1963 .
[53] Julien M. Hendrickx,et al. A generic online acceleration scheme for optimization algorithms via relaxation and inertia , 2016, Optim. Methods Softw..
[54] Daniel Kuhn,et al. Robust Markov Decision Processes , 2013, Math. Oper. Res..
[55] Yurii Nesterov,et al. Gradient methods for minimizing composite functions , 2012, Mathematical Programming.
[56] Vineet Goyal,et al. Robust Markov Decision Process: Beyond Rectangularity , 2018, 1811.00215.
[57] Emmanuel J. Candès,et al. Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..