A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies
暂无分享,去创建一个
[1] D. Blackwell,et al. Non-Existence of Everywhere Proper Conditional Distributions , 1963 .
[2] W. Rudin. Principles of mathematical analysis , 1964 .
[3] D. Blackwell. Memoryless Strategies in Finite-Stage Dynamic Programming , 1964 .
[4] D. Blackwell. Discounted Dynamic Programming , 1965 .
[5] Onésimo Hernández-Lerma,et al. Controlled Markov Processes , 1965 .
[6] A. F. Veinott. ON FINDING OPTIMAL POLICIES IN DISCRETE DYNAMIC PROGRAMMING WITH NO DISCOUNTING , 1966 .
[7] K. Parthasarathy. PROBABILITY MEASURES IN A METRIC SPACE , 1967 .
[8] David Blackwell,et al. Positive dynamic programming , 1967 .
[9] D. Blackwell. A Borel Set Not Containing a Graph , 1968 .
[10] A. F. Veinott. Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .
[11] B. L. Miller,et al. Discrete Dynamic Programming with a Small Interest Rate , 1969 .
[12] K. Hinderer,et al. Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .
[13] N. Furukawa. Markovian Decision Processes with Compact Action Spaces , 1972 .
[14] D. Bertsekas. Infinite time reachability of state-space regions by using feedback control , 1972 .
[15] D. Blackwell,et al. The Optimal Reward Operator in Dynamic Programming , 1974 .
[16] D. Freedman. The Optimal Reward Operator in Special Classes of Dynamic Programming Problems , 1974 .
[17] Evan L. Porteus. On the Optimality of Structured Policies in Countable Stage Decision Processes , 1975 .
[18] Manfred SchÄl,et al. Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal , 1975 .
[19] J. Neveu,et al. Discrete Parameter Martingales , 1975 .
[20] D. Bertsekas. Monotone Mappings with Application in Dynamic Programming , 1977 .
[21] D. Bertsekas,et al. Alternative theoretical frameworks for finite horizon discrete-time stochastic optimal control , 1977, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.
[22] Evan L. Porteus,et al. On the Optimality of Structured Policies in Countable Stage Decision Processes. II: Positive and Negative Problems , 1977 .
[23] S. Shreve. Probability measures and the C-sets of Selivanovskij , 1978 .
[24] D. Blackwell. Borel-Programmable Functions , 1978 .
[25] Dimitri P. Bertsekas,et al. Universally Measurable Policies in Dynamic Programming , 1979, Math. Oper. Res..
[26] P. Whittle. A simple condition for regularity in negative programming , 1979, Journal of Applied Probability.
[27] S. Shreve. Resolution of measurability problems in discrete — time stochastic control , 1979 .
[28] P. Whittle. Stability and characterisation conditions in negative programming , 1980, Journal of Applied Probability.
[29] R. Hartley. A simple proof of Whittle's bridging condition in dynamic programming , 1980 .
[30] S. Shreve. Borel-approachable functions , 1981 .
[31] Rolf van Dawen,et al. Negative Dynamic Programming , 1984 .
[32] William D. Sudderth,et al. The Optimal Reward Operator in Negative Dynamic Programming , 1992, Math. Oper. Res..
[33] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[34] M. K. Ghosh,et al. Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .
[35] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[36] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[37] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[38] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[39] Vivek S. Borkar,et al. Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..
[40] W. Fleming. Book Review: Discrete-time Markov control processes: Basic optimality criteria , 1997 .
[41] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[42] O. Hernández-Lerma,et al. Discrete-time Markov control processes , 1999 .
[43] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[44] E. Altman. Constrained Markov Decision Processes , 1999 .
[45] O. Hernández-Lerma,et al. Further topics on discrete-time Markov control processes , 1999 .
[46] Sean P. Meyn,et al. Value iteration and optimization of multiclass queueing networks , 1999, Queueing Syst. Theory Appl..
[47] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[48] Dudley,et al. Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .
[49] Eugene A. Feinberg,et al. Total Reward Criteria , 2002 .
[50] Eugene A. Feinberg,et al. Handbook of Markov Decision Processes , 2002 .
[51] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[52] Sean P. Meyn. Control Techniques for Complex Networks: Workload , 2007 .
[53] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[54] Shashi M. Srivastava,et al. A Course on Borel Sets , 1998, Graduate texts in mathematics.
[55] Dimitri P. Bertsekas,et al. Distributed asynchronous policy iteration in dynamic programming , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[56] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, CDC.
[57] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[58] Eugene A. Feinberg,et al. Average Cost Markov Decision Processes with Weakly Continuous Transition Probabilities , 2012, Math. Oper. Res..
[59] Dimitri P. Bertsekas,et al. On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems , 2013, Math. Oper. Res..
[60] Dimitri P. Bertsekas,et al. Q-learning and policy iteration algorithms for stochastic shortest path problems , 2012, Annals of Operations Research.
[61] Dimitri P. Bertsekas,et al. Abstract Dynamic Programming , 2013 .
[62] Huizhen Yu,et al. On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes , 2014, SIAM J. Control. Optim..
[63] Kjetil K. Haugen. Stochastic Dynamic Programming , 2016 .
[64] Peter Stone,et al. Reinforcement learning , 2019, Scholarpedia.
[65] O. Gaans. Probability measures on metric spaces , 2022 .