Focus of Attention in Reinforcement Learning
暂无分享,去创建一个
[1] Bruce A. Draper,et al. ADORE: Adaptive Object Recognition , 1999, ICVS.
[2] David G. Stork,et al. Pattern Classification , 1973 .
[3] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[4] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[5] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[7] Russell Greiner,et al. Why Experimentation can be better than "Perfect Guidance" , 1997, ICML.
[8] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[9] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[10] J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).
[11] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[12] P. Bartlett,et al. Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments , 1999 .
[13] Robert Givan,et al. Inductive Policy Selection for First-Order MDPs , 2002, UAI.
[14] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[15] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.
[16] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[17] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.
[18] David G. Stork,et al. Pattern classification, 2nd Edition , 2000 .
[19] John Langford,et al. Relating reinforcement learning performance to classification performance , 2005, ICML '05.
[20] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[21] Vadim Bulitko,et al. Machine Learning for Adaptive Image Interpretation , 2004, AAAI.
[22] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[23] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[24] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.
[25] Vadim Bulitko,et al. Batch Reinforcement Learning with State Importance , 2004, ECML.
[26] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[27] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[28] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[29] David G. Stork,et al. Pattern Classification (2nd ed.) , 1999 .
[30] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[32] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[33] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[34] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[35] S. Singh,et al. Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..
[36] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[37] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[38] Hamid R. Berenji,et al. A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters , 2003, IEEE Trans. Fuzzy Syst..
[39] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[40] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.
[41] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[42] Peter D. Turney. Types of Cost in Inductive Concept Learning , 2002, ArXiv.
[43] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..