CONVERGENCE OF SIMULATION-BASED POLICY ITERATION
暂无分享,去创建一个
[1] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[2] J. Beránek. RONALD A. HOWARD “Dynamic Programming and Markov Processes,” , 1961 .
[3] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.
[4] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[5] S. Resnick. Adventures in stochastic processes , 1992 .
[6] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[8] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[9] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[10] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[11] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[12] Bert Fristedt,et al. A modern approach to probability theory , 1996 .
[13] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[14] J. Propp,et al. Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996 .
[15] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[16] David Bruce Wilson,et al. How to Get a Perfectly Random Sample from a Generic Markov Chain and Generate a Random Spanning Tree of a Directed Graph , 1998, J. Algorithms.
[17] Xi-Ren Cao,et al. The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes , 1998, Discret. Event Dyn. Syst..
[18] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[19] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[20] X. Cao,et al. Single Sample Path-Based Optimization of Markov Chains , 1999 .
[21] David Bruce Wilson. Layered Multishift Coupling for use in Perfect Sampling Algorithms (with a primer on CFTP) , 1999 .
[22] H. Thorisson. Coupling, stationarity, and regeneration , 2000 .
[23] Xi-Ren Cao,et al. A unified approach to Markov decision problems and performance sensitivity analysis , 2000, at - Automatisierungstechnik.
[24] John Odentrantz,et al. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.
[25] Martin L. Puterman,et al. A probabilistic analysis of bias optimality in unichain Markov decision processes , 2001, IEEE Trans. Autom. Control..
[26] Peter Dayan,et al. Q-learning , 1992, Machine Learning.