A two-phase time aggregation algorithm for average cost Markov decision processes
暂无分享,去创建一个
[1] Peter B. Luh,et al. Incremental Value Iteration for Time-Aggregated Markov-Decision Processes , 2007, IEEE Transactions on Automatic Control.
[2] Marcelo D. Fragoso,et al. Approximate dynamic programming via direct search in the space of value function approximations , 2011, Eur. J. Oper. Res..
[3] Adam Shwartz,et al. Exact finite approximations of average-cost countable Markov decision processes , 2007, Autom..
[4] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] Zhiyuan Ren,et al. A time aggregation approach to Markov decision processes , 2002, Autom..
[7] Hui Peng,et al. A Survey of Approximate Dynamic Programming , 2009, 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics.
[8] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[9] Steven I. Marcus,et al. Simulation-based Algorithms for Markov Decision Processes/ Hyeong Soo Chang ... [et al.] , 2013 .
[10] Marcelo D. Fragoso,et al. Time aggregated Markov decision processes via standard dynamic programming , 2011, Oper. Res. Lett..
[11] Jiaqi Zhang,et al. A semi-Markov model with holdout transshipment policy and phase-type exponential lead time , 2011, Eur. J. Oper. Res..
[12] Bart De Schutter,et al. Approximate dynamic programming with a fuzzy parameterization , 2010, Autom..
[13] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[14] E. Fainberg. Sufficient Classes of Strategies in Discrete Dynamic Programming I: Decomposition of Randomized Strategies and Embedded Models , 1987 .
[15] Xi-Ren Cao,et al. Lebesgue-Sampling-Based Optimal Control Problems With Time Aggregation , 2011, IEEE Transactions on Automatic Control.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Zhiyuan Ren,et al. Markov decision Processes with fractional costs , 2005, IEEE Transactions on Automatic Control.
[18] Warren B. Powell,et al. Adaptive Stochastic Control for the Smart Grid , 2011, Proceedings of the IEEE.
[19] Edilson Fernandes de Arruda,et al. Stability and optimality of a multi-product production and storage system under demand uncertainty , 2008, Eur. J. Oper. Res..
[20] D. Bertsekas. A New Value Iteration method for the Average Cost Dynamic Programming Problem , 1998 .
[21] W. Marsden. I and J , 2012 .
[22] Ioannis Ch. Paschalidis,et al. A Distributed Actor-Critic Algorithm and Applications to Mobile Sensor Network Coordination Problems , 2010, IEEE Transactions on Automatic Control.
[23] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.