Learning to control an inverted pendulum using neural networks

An inverted pendulum is simulated as a control task with the goal of learning to balance the pendulum with no a priori knowledge of the dynamics. In contrast to other applications of neural networks to the inverted pendulum task, performance feedback is assumed to be unavailable on each step, appearing only as a failure signal when the pendulum falls or reaches the bounds of a horizontal track. To solve this task, the controller must deal with issues of delayed performance evaluation, learning under uncertainty, and the learning of nonlinear functions. Reinforcement and temporal-difference learning methods are presented that deal with these issues to avoid unstable conditions and balance the pendulum.<<ETX>>

[1]  R. H. Cannon,et al.  Dynamics of Physical Systems , 1967 .

[2]  E. Eastwood,et al.  Chairman's address: Control & Automation Division. Control theory and the engineer , 1968 .

[3]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[4]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[6]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[7]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[8]  Paul E. Utgoff,et al.  Learning to control a dynamic physical system , 1987, Comput. Intell..

[9]  Ka Cheok,et al.  A ball balancing demonstration of optimal and disturbance-accomodating control , 1987, IEEE Control Systems Magazine.

[10]  Charles W. Anderson,et al.  Strategy Learning with Multilayer Connectionist Representations , 1987 .

[11]  W. Thomas Miller,et al.  Sensor-based control of robotic manipulators using a general learning algorithm , 1987, IEEE J. Robotics Autom..

[12]  B. Widrow,et al.  An adaptive 'broom balancer' with visual inputs , 1988, IEEE 1988 International Conference on Neural Networks.

[13]  Altu ftar,et al.  Robust controller design for large scale systems , 1988 .

[14]  A. King Discretization and model reduction for a class of nonlinear systems , 1988 .

[15]  A. Guez,et al.  A trainable neuromorphic controller , 1988 .

[16]  Iulian Constantinescu On the asymptotic eigenstructure of multivariable systems with high feedback gain , 1988 .

[17]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..