论文信息 - Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time

Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time

We propose a novel randomized linear programming algorithm for approximating the optimal policy of the discounted Markov decision problem. By leveraging the value-policy duality and binary-tree data structures, the algorithm adaptively samples state-action-state transitions and makes exponentiated primal-dual updates. We show that it finds an $\epsilon$-optimal policy using nearly-linear run time in the worst case. When the Markov decision process is ergodic and specified in some special data formats, the algorithm finds an $\epsilon$-optimal policy using run time linear in the total number of state-action pairs, which is sublinear in the input size. These results provide a new venue and complexity benchmarks for solving stochastic dynamic programs.

Mengdi Wang | Mengdi Wang

[1] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[2] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[3] George B. Dantzig,et al. Linear programming and extensions , 1965 .

[4] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[5] Dimitri P. Bertsekas,et al. Abstract Dynamic Programming , 2013 .

[6] Chak-Kuen Wong,et al. An Efficient Method for Weighted Sampling Without Replacement , 1980, SIAM J. Comput..

[7] Yishay Mansour,et al. On the Complexity of Policy Iteration , 1999, UAI.

[8] R. Rubinstein,et al. An Efficient Stochastic Approximation Algorithm for Stochastic Saddle Point Problems , 2005 .

[9] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.

[10] Mengdi Wang,et al. Lower Bound On the Computational Complexity of Discounted Markov Decision Problems , 2017, ArXiv.

[11] Mengdi Wang,et al. Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning , 2016, ArXiv.

[12] Yin Tat Lee,et al. Efficient Inverse Maintenance and Faster Algorithms for Linear Programming , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[13] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[14] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[15] A. Juditsky,et al. Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[16] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .

[17] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[18] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .

[19] Bruno Scherrer,et al. Improved and Generalized Upper Bounds on the Complexity of Policy Iteration , 2013, Math. Oper. Res..

[20] P. Tseng. Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .

[21] Yin Tat Lee,et al. Path Finding Methods for Linear Programming: Solving Linear Programs in Õ(vrank) Iterations and Faster Algorithms for Maximum Flow , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[22] Yinyu Ye,et al. A New Complexity Result on Solving the Markov Decision Problem , 2005, Math. Oper. Res..

[23] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[24] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .

[25] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[26] Mengdi Wang,et al. An online primal-dual method for discounted Markov decision processes , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[27] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[28] Konstantinos Panagiotou,et al. Efficient Sampling Methods for Discrete Distributions , 2012, ICALP.

[29] Eugene A. Feinberg,et al. The value iteration algorithm is not strongly polynomial for discounted dynamic programming , 2013, Oper. Res. Lett..

[30] David P. Woodruff,et al. Sublinear Optimization for Machine Learning , 2010, FOCS.

[31] Yinyu Ye,et al. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2011, Math. Oper. Res..