Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning
暂无分享,去创建一个
[1] J. F. White. Models of Preventive Maintenance , 1978 .
[2] Averill M. Law,et al. Simulation Modeling and Analysis , 1982 .
[3] Randall P. Sadowski,et al. Introduction to Simulation Using Siman , 1990 .
[4] F. A. van der Duyn Schouten,et al. Maintenance optimization of a production system with buffer capacity , 1995 .
[5] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[6] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[8] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[9] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[10] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[11] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[12] Prasad Tadepalli,et al. Auto-Exploratory Average Reward Reinforcement Learning , 1996, AAAI/IAAI, Vol. 1.
[13] Sudeep Sarkar,et al. Optimal preventive maintenance in a production inventory system , 1999 .
[14] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[15] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.