Explorations of the practical issues of learning prediction-control tasks using temporal difference learning methods

Abstract : There has been recent interest in using a class of incremental learning algorithms called temporal difference learning methods to attack problems of prediction. These algorithms have been brought to bear on various prediction problems in the past, but have remained poorly understood. It is the purpose of this thesis to further explore this class of algorithms, particularly the TD (lambda) algorithm. A number of practical issues are raised and discussed from a general theoretical perspective and then explored in the context of several case studies. the thesis presents a framework for viewing these algorithms independent of the particular task at hand and uses this framework to explore not only tasks of prediction, but also prediction tasks that require control, whether complete or partial. This includes applying the TD (Lambda) algorithm to two tasks: (1) learning to play tic-tac-toe from the outcome of self-play and the outcome of play against a perfectly-playing opponent and (2) learning two simple one-dimensional image segmentation tasks.

[1]  I. Omiaj,et al.  Extensions of a Theory of Networks for Approximation and Learning : dimensionality reduction and clustering , 2022 .

[2]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  E. Feigenbaum,et al.  Computers and Thought , 1963 .

[4]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  Ian H. Witten,et al.  An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[7]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[8]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[9]  Michael J. Jones Using Recurrent Networks for Dimensionality Reduction , 1992 .

[10]  A. Hurlbert The Computation of Color , 1989 .

[11]  Jaime G. Carbonell,et al.  Machine learning: a guide to current research , 1986 .

[12]  J. Christensen Learning static evaluation functions by linear regression , 1986 .

[13]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[14]  A G Barto,et al.  Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[15]  S. Thomas Alexander,et al.  Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[16]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[17]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[18]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[19]  Richard E. Korf,et al.  A Unified Theory of Heuristic Evaluation Functions and its Application to Learning , 1986, AAAI.

[20]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[21]  Thomas G. Dietterich,et al.  Learning to Predict Sequences , 1985 .

[22]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[23]  Ryszard S. Michalski,et al.  Machine learning: an artificial intelligence approach volume III , 1990 .