Self Learning Control of Constrained Markov Decision Processes - A Gradient Approach
暂无分享,去创建一个
[1] G. Pflug. Stochastic Approximation Methods for Constrained and Unconstrained Systems - Kushner, HJ.; Clark, D.S. , 1980 .
[2] Dimitri P. Bertsekas,et al. Constrained Optimization and Lagrange Multiplier Methods , 1982 .
[3] Keith W. Ross. Markov decision processes with constraints , 1984 .
[4] H. Kushner,et al. Weak convergence and asymptotic properties of adaptive filters with constant gains , 1984, IEEE Trans. Inf. Theory.
[5] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[6] Michael N. Katehakis,et al. The Multi-Armed Bandit Problem: Decomposition and Computation , 1987, Math. Oper. Res..
[7] Keith W. Ross,et al. Markov Decision Processes with Sample Path Constraints: The Communicating Case , 1989, Oper. Res..
[8] Paul Glasserman,et al. Gradient Estimation Via Perturbation Analysis , 1990 .
[9] H. Kushner,et al. Analysis of adaptive step size SA algorithms for parameter tracking , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.
[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[11] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .
[12] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[13] Vikram Krishnamurthy,et al. Iterative and recursive estimators for hidden Markov errors-in-variables models , 1996, IEEE Trans. Signal Process..
[14] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[15] Lorne G. Mason,et al. Adaptive decentralized control under non-uniqueness of the optimal control , 1996, Discret. Event Dyn. Syst..
[16] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.
[17] Keith W. Ross,et al. Multiservice Loss Models for Broadband Telecommunication Networks , 1997 .
[18] A Orman,et al. Optimization of Stochastic Models: The Interface Between Simulation and Optimization , 2012, J. Oper. Res. Soc..
[19] E. Altman. Constrained Markov Decision Processes , 1999 .
[20] Felisa J. Vázquez-Abad,et al. Strong points of weak convergence: a study using RPA gradient estimation for automatic learning, , 1999, Autom..
[21] F. Vázquez-Abad,et al. Measure valued differentiation for stochastic processes : the finite horizon case , 2000 .
[22] Liyi Dai. Perturbation analysis via coupling , 2000, IEEE Trans. Autom. Control..
[23] J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).
[24] Alexander S. Poznyak,et al. Self-Learning Control of Finite Markov Chains , 2000 .
[25] H. Vincent Poor,et al. Integrated voice/data call admission control for wireless DS-CDMA systems , 2002, IEEE Trans. Signal Process..
[26] V. Krishnamurthy,et al. Implementation of gradient estimation to a constrained Markov decision problem , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).
[27] David G. Luenberger,et al. Linear and Nonlinear Programming: Second Edition , 2003 .
[28] Paulo J. S. Silva,et al. Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms , 2004, Numerical Algorithms.