Potential-based online policy iteration algorithms for Markov decision processes
暂无分享,去创建一个
[1] Xi-Ren Cao,et al. Perturbation analysis and optimization of queueing networks , 1983 .
[2] Peter W. Glynn,et al. Likelilood ratio gradient estimation: an overview , 1987, WSC '87.
[3] Rajan Suri,et al. Single Run Optimization of Discrete Event Simulations—An Empirical Study Using the M/M/l Queue , 1989 .
[4] P. Glynn. Optimization of stochastic systems via simulation , 1989, WSC '89.
[5] Alan Weiss,et al. Sensitivity Analysis for Simulations via Likelihood Ratios , 1989, Oper. Res..
[6] Xi-Ren Cao,et al. Perturbation analysis of discrete event dynamic systems , 1991 .
[7] Peter W. Glynn,et al. Gradient estimation for ratios , 1991, 1991 Winter Simulation Conference Proceedings..
[8] P. R. Kumar,et al. Re-entrant lines , 1993, Queueing Syst. Theory Appl..
[9] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[10] M. K. Ghosh,et al. Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .
[11] Michael C. Fu,et al. Smoothed perturbation analysis derivative estimation for Markov chains , 1994, Oper. Res. Lett..
[12] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[13] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[14] E. Chong,et al. Stochastic optimization of regenerative systems using infinitesimal perturbation analysis , 1994, IEEE Trans. Autom. Control..
[15] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[16] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[17] Stephen M. Robinson,et al. Sample-path optimization of convex stochastic performance functions , 1996, Math. Program..
[18] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[19] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[20] Sean P. Meyn. The policy iteration algorithm for average reward Markov decision processes with general state space , 1997, IEEE Trans. Autom. Control..
[21] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[22] Xi-Ren Cao,et al. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..
[23] Xi-Ren Cao,et al. The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes , 1998, Discret. Event Dyn. Syst..
[24] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[25] Loo Hay Lee,et al. Explanation of goal softening in ordinal optimization , 1999, IEEE Trans. Autom. Control..
[26] Christos G. Cassandras,et al. Introduction to Discrete Event Systems , 1999, The Kluwer International Series on Discrete Event Dynamic Systems.
[27] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[28] X. Cao,et al. Single Sample Path-Based Optimization of Markov Chains , 1999 .
[29] John Odentrantz,et al. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.
[30] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[31] Zhiyuan Ren,et al. A time aggregation approach to Markov decision processes , 2002, Autom..
[32] Xi-Ren Cao,et al. Gradient-based policy iteration: an example , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..
[33] William L. Cooper,et al. CONVERGENCE OF SIMULATION-BASED POLICY ITERATION , 2003, Probability in the Engineering and Informational Sciences.
[34] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[35] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[36] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[37] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.