Reinforcement learning for long-run average cost
暂无分享,去创建一个
[2] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[3] S. Matwin,et al. Learning Two-Tiered Descriptions of Flexible Concepts: The POSEIDON System , 1992, Machine Learning.
[4] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[5] Tapas K. Das,et al. A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking , 2002 .
[6] Tamer Basar,et al. Analysis of Recursive Stochastic Algorithms , 2001 .
[7] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[8] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[9] Sudeep Sarkar,et al. Optimal preventive maintenance in a production inventory system , 1999 .
[10] S. Mahadevan,et al. Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .
[11] Abhijit Gosavi,et al. An algorithm for solving semi-markov decision problems using reinforcement learning: convergence analysis and numerical results , 1999 .
[12] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[13] L. Sennott. Stochastic Dynamic Programming and the Control of Queueing Systems , 1998 .
[14] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[15] V. Borkar,et al. An analog scheme for fixed point computation. I. Theory , 1997 .
[16] V. Borkar. Stochastic approximation with two time scales , 1997 .
[17] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[18] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[19] Prasad Tadepalli,et al. Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.
[20] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[21] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[22] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[23] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[24] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[25] Christos G. Cassandras,et al. Optimal inspection policies for a manufacturing station , 1992 .
[26] Tadayoshi Shioyama. Optimal control of a queuing network system with two types of customers , 1991 .
[27] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[28] Elmer E Lewis,et al. Introduction To Reliability Engineering , 1987 .
[29] P. Schweitzer,et al. Part selection policy for a flexible manufacturing cell feeding several production lines , 1984 .
[30] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[31] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.