A comparison of supervised and reinforcement learning methods on a reinforcement learning task

The forward modeling approach of M.I. Jordan and J.E. Rumelhart (1990) has been shown to be applicable when supervised learning methods are to be used for solving reinforcement learning tasks. Because such tasks are natural candidates for the application of reinforcement learning methods, there is a need to evaluate the relative merits of these two learning methods on reinforcement learning tasks. The author presents one such comparison on a task involving learning to control an unstable, nonminimum phase, dynamic system. The comparison shows that the reinforcement learning method used performs better than the supervised learning method. An examination of the learning behavior of the two methods indicates that the differences in performance can be attributed to the underlying mechanics of the two learning methods, which provides grounds for believing that similar performance differences can be expected on other reinforcement learning tasks as well.<<ETX>>

[1]  Vijaykumar Gullapalli,et al.  A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[2]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[3]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  V. Gullapalli,et al.  Associative reinforcement learning of real-valued functions , 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics.

[5]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[6]  C.W. Anderson,et al.  Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[7]  Michael I. Jordan,et al.  Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[8]  Alexis P. Wieland,et al.  Evolving Controls for Unstable Systems , 1991 .

[9]  Paul E. Utgoff,et al.  Learning to control a dynamic physical system , 1987, Comput. Intell..

[10]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[11]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[13]  A. P. Wieland,et al.  Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.