Reinforcement learning for long-run average cost

[1]  A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward , 2004 .

[2]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[3]  S. Matwin,et al.  Learning Two-Tiered Descriptions of Flexible Concepts: The POSEIDON System , 1992, Machine Learning.

[4]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[5]  Tapas K. Das,et al.  A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking , 2002 .

[6]  Tamer Basar,et al.  Analysis of Recursive Stochastic Algorithms , 2001 .

[7]  Sean P. Meyn,et al.  The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[8]  Vivek S. Borkar,et al.  Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[9]  Sudeep Sarkar,et al.  Optimal preventive maintenance in a production inventory system , 1999 .

[10]  S. Mahadevan,et al.  Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .

[11]  Abhijit Gosavi,et al.  An algorithm for solving semi-markov decision problems using reinforcement learning: convergence analysis and numerical results , 1999 .

[12]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[13]  L. Sennott Stochastic Dynamic Programming and the Control of Queueing Systems , 1998 .

[14]  V. Borkar Asynchronous Stochastic Approximations , 1998 .

[15]  V. Borkar,et al.  An analog scheme for fixed point computation. I. Theory , 1997 .

[16]  V. Borkar Stochastic approximation with two time scales , 1997 .

[17]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[18]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[19]  Prasad Tadepalli,et al.  Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.

[20]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[21]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[22]  Satinder P. Singh,et al.  Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[23]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[24]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[25]  Christos G. Cassandras,et al.  Optimal inspection policies for a manufacturing station , 1992 .

[26]  Tadayoshi Shioyama Optimal control of a queuing network system with two types of customers , 1991 .

[27]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[28]  Elmer E Lewis,et al.  Introduction To Reliability Engineering , 1987 .

[29]  P. Schweitzer,et al.  Part selection policy for a flexible manufacturing cell feeding several production lines , 1984 .

[30]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[31]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.