论文信息 - Reinforcement learning for long-run average cost - 字舞流文

Reinforcement learning for long-run average cost

Abhijit Gosavi | A. Gosavi

[1] A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward , 2004 .

[2] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[3] S. Matwin,et al. Learning Two-Tiered Descriptions of Flexible Concepts: The POSEIDON System , 1992, Machine Learning.

[4] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[5] Tapas K. Das,et al. A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking , 2002 .

[6] Tamer Basar,et al. Analysis of Recursive Stochastic Algorithms , 2001 .

[7] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[8] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[9] Sudeep Sarkar,et al. Optimal preventive maintenance in a production inventory system , 1999 .

[10] S. Mahadevan,et al. Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .

[11] Abhijit Gosavi,et al. An algorithm for solving semi-markov decision problems using reinforcement learning: convergence analysis and numerical results , 1999 .

[12] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[13] L. Sennott. Stochastic Dynamic Programming and the Control of Queueing Systems , 1998 .

[14] V. Borkar. Asynchronous Stochastic Approximations , 1998 .

[15] V. Borkar,et al. An analog scheme for fixed point computation. I. Theory , 1997 .

[16] V. Borkar. Stochastic approximation with two time scales , 1997 .

[17] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[18] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .

[19] Prasad Tadepalli,et al. Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.

[20] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[21] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[22] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[23] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[24] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[25] Christos G. Cassandras,et al. Optimal inspection policies for a manufacturing station , 1992 .

[26] Tadayoshi Shioyama. Optimal control of a queuing network system with two types of customers , 1991 .

[27] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .

[28] Elmer E Lewis,et al. Introduction To Reliability Engineering , 1987 .

[29] P. Schweitzer,et al. Part selection policy for a flexible manufacturing cell feeding several production lines , 1984 .

[30] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[31] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.