Reinforcement Learning: a Brief Overview

Learning is considered as an essential aspect of intelligence. It takes usually place in some context where one learns from an environment. There are various forms of learning: How to learn and what to learn. Here we are concerned with learning of informal concepts. Informal concepts occur in many forms: Heuristics, personal judgements, utterances about taste etc. Such concepts provide to major difficulties: 1) Informal concepts do not have a precise definition and often not a definition at all. 2) Informal concepts are subjective and their interpretation depends on persons or groups of persons. 3) Not the concepts themselves play the major role but rather the way one uses them. The use is manifold but mainly connected with decisions for or against a behavior or an action. 4) The concepts and the use of the concepts have to be learned. 5) There is no sharp measurement of what the meaning of ‘successful learning’ is: The learning success is again something imprecise. As a consequence, the approximation character of the learning process is central.

[1]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[2]  Richard S. Sutton,et al.  Learning and Sequential Decision Making , 1989 .

[3]  George Lakoff,et al.  Women, Fire, and Dangerous Things , 1987 .

[4]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[5]  Armin Stahl,et al.  Learning Feature Weights from Case Order Feedback , 2001, ICCBR.

[6]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[7]  Jieyu Zhao,et al.  Direct Policy Search and Uncertain Policy Evaluation , 1998 .

[8]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  Michael M. Richter,et al.  On the Notion of Similarity in Case Based Reasoning and Fuzzy Theory , 2001, Soft Computing in Case Based Reasoning.

[10]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[11]  J. J. Martin Bayesian Decision Problems and Markov Chains , 1967 .

[12]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[13]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[14]  Jeremy L. Wyatt,et al.  Exploration Control in Reinforcement Learning using Optimistic Model Selection , 2001, International Conference on Machine Learning.

[15]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[16]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[17]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[18]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[19]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[20]  Paul Bourgine,et al.  Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty , 1999, Machine Learning.

[21]  Stefan Wess,et al.  Case-Based Reasoning Technology: From Foundations to Applications , 1998, Lecture Notes in Computer Science.

[22]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine-mediated learning.

[23]  Katia P. Sycara,et al.  Evolutionary Search, Stochastic Policies with Memory, and Reinforcement Learning with Hidden State , 2001, ICML.

[24]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[25]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[26]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[27]  Zdzislaw Pawlak,et al.  Rough classification , 1984, Int. J. Hum. Comput. Stud..

[28]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[29]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[30]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[31]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[32]  Andrew W. Moore,et al.  Direct Policy Search using Paired Statistical Tests , 2001, ICML.

[33]  Claude-Nicolas Fiechter Expected Mistake Bound Model for On-Line Reinforcement Learning , 1997, ICML.

[34]  Jeremy Wyatt,et al.  Exploration and inference in learning from reinforcement , 1998 .

[35]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[36]  Geoffrey J. Gordon Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.

[37]  Jürgen Schmidhuber,et al.  Efficient model-based exploration , 1998 .