Average reward reinforcement learning: Foundations, algorithms, and empirical results
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] D. White,et al. Dynamic programming, Markov chains, and the method of successive approximations , 1963 .
[3] Rutherford Aris,et al. Discrete Dynamic Programming , 1965, The Mathematical Gazette.
[4] A. F. Veinott. Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .
[5] Eric V. Denardo,et al. Computing a Bias-Optimal Policy in a Discrete-Time Markov Decision Problem , 1970, Oper. Res..
[6] A. Hordijk,et al. A MODIFIED FORM OF THE ITERATIVE METHOD OF DYNAMIC PROGRAMMING , 1975 .
[7] Paul J. Schweitzer,et al. Successive Approximation Methods for Solving Nested Functional Equations in Markov Decision Problems , 1984, Math. Oper. Res..
[8] Richard Wheeler,et al. Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.
[9] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[10] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[11] A. Jalali,et al. Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[12] Joseph F. Engelberger,et al. Robotics in Service , 1989 .
[13] A. Jalali,et al. A distributed asynchronous algorithm for expected average cost dynamic programming , 1990, 29th IEEE Conference on Decision and Control.
[14] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[15] M. Puterman,et al. An improved algorithm for solving communicating average reward Markov decision processes , 1991 .
[16] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[17] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[18] Sridhar Mahadevan,et al. Enhancing Transfer in Reinforcement Learning by Building Stochastic Models of Robot Actions , 1992, ML.
[19] Tom M. Mitchell,et al. A Personal Learning Apprentice , 1992, AAAI.
[20] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[21] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .
[22] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[23] Marcos Salganicoff,et al. Density-Adaptive Learning and Forgetting , 1993, ICML.
[24] Jonas Karlsson,et al. Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging , 1993 .
[25] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[26] Satinder Singh,et al. Learning to Solve Markovian Decision Processes , 1993 .
[27] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.
[28] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[29] Sridhar Mahadevan,et al. To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.
[30] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[31] Prasad Tadepalli,et al. H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward , 1994 .
[32] Craig Boutilier,et al. Process-Oriented Planning and Average-Reward Optimality , 1995, IJCAI.
[33] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[34] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[35] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[36] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[37] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[38] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[39] J. Walrand,et al. Distributed Dynamic Programming , 2022 .