Q-learning and policy iteration algorithms for stochastic shortest path problems
暂无分享,去创建一个
[1] L. A. Zadeh,et al. Optimal Pursuit Strategies in Discrete-State Probabilistic Systems , 1962 .
[2] A. F. Veinott. Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .
[3] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .
[4] Gérard M. Baudet,et al. Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.
[5] Peter Whittle,et al. Optimization Over Time , 1982 .
[6] Dimitri P. Bertsekas,et al. Distributed asynchronous computation of fixed points , 1983, Math. Program..
[7] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[8] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..
[9] Eugene A. Feinberg,et al. On Stationary Strategies in Borel Dynamic Programming , 1992, Math. Oper. Res..
[10] Ronald J. Williams,et al. Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .
[11] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[12] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[13] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[14] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[15] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[16] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[17] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[18] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[19] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[20] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[21] Vdi,et al. European control conference ECC'99 , 1999 .
[22] Pat Langley,et al. Editorial: On Machine Learning , 1986, Machine Learning.
[23] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[24] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[25] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..
[26] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .
[27] D. Bertsekas,et al. A Least Squares Q-Learning Algorithm for Optimal Stopping Problems , 2007 .
[28] D. Bertsekas,et al. Q-learning algorithms for optimal stopping based on least squares , 2007, 2007 European Control Conference (ECC).
[29] Robin Moore. In Place Of , 2009 .
[30] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[31] Dimitri P. Bertsekas,et al. Distributed asynchronous policy iteration in dynamic programming , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[32] B. Scherrer,et al. Least-Squares Policy Iteration: Bias-Variance Trade-off in Control Problems , 2010, ICML.
[33] B. Scherrer,et al. Performance bound for Approximate Optimistic Policy Iteration , 2010 .
[34] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[35] Dimitri P. Bertsekas,et al. Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.
[36] Dimitri P. Bertsekas,et al. On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems , 2013, Math. Oper. Res..
[37] Uriel G. Rothblum,et al. (Approximate) iterated successive approximations algorithm for sequential decision processes , 2013, Ann. Oper. Res..
[38] J. Walrand,et al. Distributed Dynamic Programming , 2022 .