Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning
暂无分享,去创建一个
[1] Samuel Karlin,et al. The structure of dynamic programing models , 1955 .
[2] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[3] D. White,et al. Dynamic programming, Markov chains, and the method of successive approximations , 1963 .
[4] D. Blackwell. Discounted Dynamic Programming , 1965 .
[5] Rutherford Aris,et al. Discrete Dynamic Programming , 1965, The Mathematical Gazette.
[6] Ilya B. Gertsbakh,et al. Models of Preventive Maintenance , 1977 .
[7] J. F. White. Models of Preventive Maintenance , 1978 .
[8] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.
[9] Averill M. Law,et al. Simulation Modeling and Analysis , 1982 .
[10] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[11] Richard Wheeler,et al. Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.
[12] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[13] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[14] John Moody,et al. Learning rate schedules for faster stochastic gradient search , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.
[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[16] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[17] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[18] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[19] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[20] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[21] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[22] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[23] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[24] Prasad Tadepalli,et al. Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.
[25] Vivek S. Borkar,et al. Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..
[26] Benjamin Van Roy,et al. A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[27] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[28] Sridhar Mahadevan,et al. Optimizing Production Manufacturing Using Reinforcement Learning , 1998, FLAIRS.
[29] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[30] Sudeep Sarkar,et al. Optimal preventive maintenance in a production inventory system , 1999 .
[31] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.