Reinforcement Learning: A Survey
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[3] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .
[4] R. Karp,et al. On Nonterminating Stochastic Games , 1966 .
[5] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .
[6] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..
[7] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .
[8] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .
[9] Raymond J. Bandlow. Theories of Learning, 4th Edition. By Ernest R. Hilgard and Gordon H. Bower. Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1975 , 1976 .
[10] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[11] G. Siouris,et al. Optimum systems control , 1979, Proceedings of the IEEE.
[12] Alexander Graham,et al. Introduction to Control Theory, Including Optimal Control , 1980 .
[13] R. Mortensen. Introduction to Control Theory, Including Optimal Control (David Burghes and Alexander Graham) , 1982 .
[14] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[15] R.M. Dunn,et al. Brains, behavior, and robotics , 1983, Proceedings of the IEEE.
[16] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[17] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[18] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .
[19] James L. McClelland,et al. James L. McClelland, David Rumelhart and the PDP Research Group, Parallel distributed processing: explorations in the microstructure of cognition . Vol. 1. Foundations . Vol. 2. Psychological and biological models . Cambridge MA: M.I.T. Press, 1987. , 1989, Journal of Child Language.
[20] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[21] R. Stengel. Stochastic Optimal Control: Theory and Application , 1986 .
[22] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .
[23] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[24] Ullrich Rüde. Mathematical and Computational Techniques for Multilevel Adaptive Methods , 1987 .
[25] George E. P. Box,et al. Empirical Model‐Building and Response Surfaces , 1988 .
[26] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[27] David K. Smith. Theory of Linear and Integer Programming , 1987 .
[28] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .
[29] W. Cleveland,et al. Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .
[30] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[31] C. Watkins. Learning from delayed rewards , 1989 .
[32] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[33] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .
[34] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[35] David H. Ackley,et al. Generalization and Scaling in Reinforcement Learning , 1989, NIPS.
[36] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.
[37] Rodney A. Brooks,et al. Learning to Coordinate Behaviors , 1990, AAAI.
[38] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.
[39] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[40] A. Moore. Variable Resolution Dynamic Programming , 1991, ML.
[41] Hamid R. Berenji. Artificial Neural Networks and Approximate Reasoning for Intelligent Control in Space , 1991 .
[42] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[43] Sridhar Mahadevan,et al. Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.
[44] Chuen-Chien Lee,et al. A self‐learning rule‐based controller employing approximate reasoning and neural net concepts , 1991, Int. J. Intell. Syst..
[45] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.
[46] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.
[47] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.
[48] Christopher G. Atkeson,et al. Memory-Based Learning Control , 1991, 1991 American Control Conference.
[49] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[50] H. Berenji. Artificial Neural Networks and Approximate Reasoning for Intelligent Control in Space , 1991, 1991 American Control Conference.
[51] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[52] Anne Condon,et al. The Complexity of Stochastic Games , 1992, Inf. Comput..
[53] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[54] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[55] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[56] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[57] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[58] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .
[59] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.
[60] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .
[61] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .
[62] C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .
[63] Paul M. B. Vitányi,et al. Theories of learning , 2007 .
[64] L.-J. Lin,et al. Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.
[65] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[66] Roderic A. Grupen,et al. Robust Reinforcement Learning in Motion Planning , 1993, NIPS.
[67] Leslie Pack Kaelbling,et al. Planning With Deadlines in Stochastic Domains , 1993, AAAI.
[68] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.
[69] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[70] Reid G. Simmons,et al. Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.
[71] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[72] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[73] Ronald J. Williams,et al. Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .
[74] Sridhar Mahadevan,et al. Rapid Task Learning for Real Robots , 1993 .
[75] Satinder Singh,et al. Learning to Solve Markovian Decision Processes , 1993 .
[76] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[77] Dean A. Pomerleau,et al. Neural Network Perception for Mobile Robot Guidance , 1993 .
[78] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.
[79] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.
[80] VehicleLisa Meedeny,et al. Emergent Control and Planning in an Autonomous , 1993 .
[81] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[82] Gary McGraw,et al. Emergent Control and Planning in an Autonomous Vehicle , 1993 .
[83] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[84] John Alan Kirman. Predicting real-time planner performance by domain characterization , 1994 .
[85] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[86] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[87] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[88] Sridhar Mahadevan,et al. To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.
[89] Sebastian Thrun,et al. Learning to Play the Game of Chess , 1994, NIPS.
[90] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[91] Richard W. Prager,et al. A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.
[92] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[93] Marco Colombetti,et al. Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..
[94] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[95] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[96] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[97] C. Fiechter. Eecient Reinforcement Learning , 1994 .
[98] S. Schaal,et al. Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.
[99] Dave Cliff,et al. Adding Temporary Memory to ZCS , 1994, Adapt. Behav..
[100] Marco Dorigo,et al. A comparison of Q-learning and classifier systems , 1994 .
[101] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[102] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[103] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.
[104] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[105] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[106] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning , 1995 .
[107] Marcos Salganicoff,et al. Active Exploration and Learning in real-Valued Spaces using Multi-Armed Bandit Allocation Indices , 1995, ICML.
[108] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[109] Stewart W. Wilson. Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.
[110] M. Dorigo. ALECSYS and the AutonoMouse: Learning to Control a Real Robot by Distributed Classifier Systems , 1995, Machine Learning.
[111] J. Mulawka. Fast and Eecient Reinforcement Learning with Truncated Temporal Diierences , 1995 .
[112] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[113] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[114] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[115] Marco Dorigo,et al. Alecsys and the AutonoMouse: Learning to control a real robot by distributed classifier systems , 2004, Machine Learning.
[116] Pawel Cichosz,et al. Fast and Efficient Reinforcement Learning with Truncated Temporal Differences , 1995, ICML.
[117] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[118] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[119] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[120] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[121] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.
[122] José del R. Millán,et al. Rapid, safe, and incremental learning of navigation strategies , 1996, IEEE Trans. Syst. Man Cybern. Part B.
[123] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[124] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[125] John Rust. Numerical dynamic programming in economics , 1996 .
[126] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.