Associative Reinforcement Learning: Functions in k-DNF

An agent that must learn to act in the world by trial and error faces the reinforcement learning problem, which is quite different from standard concept learning. Although good algorithms exist for this problem in the general case, they are often quite inefficient and do not exhibit generalization. One strategy is to find restricted classes of action policies that can be learned more efficiently. This paper pursues that strategy by developing algorithms that can efficiently learn action maps that are expressible in k-DNF. The algorithms are compared with existing methods in empirical trials and are shown to have very good performance.

[1]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[2]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[3]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[4]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[5]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[6]  D. Wolpert On Overfitting Avoidance as Bias , 1993 .

[7]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[8]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[9]  Charles W. Anderson,et al.  Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[10]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[11]  Jean Dickinson Gibbons,et al.  Nonparametric Statistical Inference , 1972, International Encyclopedia of Statistical Science.

[12]  Herbert B. Enderton,et al.  A mathematical introduction to logic , 1972 .

[13]  R. J. Williams,et al.  On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.

[14]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[15]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[16]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[17]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[18]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[19]  L. Kaelbling Associative reinforcement learning: A generate and test algorithm , 2004, Machine Learning.

[20]  R. Larsen An introduction to mathematical statistics and its applications / Richard J. Larsen, Morris L. Marx , 1986 .

[21]  Richard S. Sutton,et al.  Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.

[22]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[23]  M. Gluck Stimulus Generalization and Representation in Adaptive Network Models of Category Learning , 1991 .

[24]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.