Double inverted pendulum control by linear quadratic regulator and reinforcement learning

The paper gives an original combination of linear quadratic regulator and reinforcement learning dedicated to the position control of a double inverted pendulum system. An agent based on a modified Sarsa algorithm is applied to swing up the pendulum. The linear quadratic regulator is applied to the linearized mathematical model of the process in the vicinity of upright position. Digital simulation results show the performance of the new approach.

[1]  Mohammad Teshnelab,et al.  Feedback-error-learning for stability of Double Inverted Pendulum , 2009, SMC 2009.

[2]  Kevin Kok Wai Wong,et al.  Fuzzy Rule Interpolation Matlab Toolbox - FRI Toolbox , 2006, 2006 IEEE International Conference on Fuzzy Systems.

[3]  Jette Randløv,et al.  Shaping in Reinforcement Learning by Changing the Physics of the Problem , 2000, ICML.

[4]  Duan Ping,et al.  Double inverted pendulum system control strategy based on fuzzy genetic algorithm , 2009, 2009 IEEE International Conference on Automation and Logistics.

[5]  Andrew G. Barto,et al.  Combining Reinforcement Learning with a Local Control Algorithm , 2000, ICML.

[6]  Zsolt Csaba Johanyák,et al.  Fuzzy Rule Interpolation Based on Polar Cuts , 2006 .

[7]  Igor Skrjanc,et al.  Identification of dynamical systems with a robust interval fuzzy model , 2005, Autom..

[8]  Tyrone L. Vincent,et al.  A Chaotic Controller for the Double Pendulum , 1994 .

[9]  Chin-Teng Lin,et al.  Nonlinear System Control Using Adaptive Neural Fuzzy Networks Based on a Modified Differential Evolution , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  J Richalet,et al.  An approach to predictive control of multivariable time-delayed plant: stability and design issues. , 2004, ISA transactions.

[11]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[12]  Imre J. Rudas,et al.  Modeling and Problem Solving Techniques for Engineers , 2004 .

[13]  Milos Manic,et al.  Multi-robot, multi-target Particle Swarm Optimization search in noisy wireless environments , 2009, 2009 2nd Conference on Human System Interactions.

[14]  Yuanwei Jing,et al.  A Q-learning model-independent flow controller for high-speed networks , 2009, 2009 American Control Conference.

[15]  J. Vascák,et al.  Using Neural Gas Networks in Traffic Navigation , 2009 .

[16]  Rodolfo E. Haber,et al.  An optimal fuzzy control system in a network environment based on simulated annealing. An application to a drilling process , 2009, Appl. Soft Comput..

[17]  E.M. Petriu,et al.  Iterative Learning Control experimental results for inverted pendulum crane mode control , 2009, 2009 7th International Symposium on Intelligent Systems and Informatics.

[18]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[19]  S. Preitl,et al.  Experimental validation of Iterative Feedback Tuning solutions for inverted pendulum crane mode control , 2008, 2008 Conference on Human System Interactions.

[20]  Lian-yun He Analysis on Influence of CMAC Neural Network Parameters Selection on Network Performance , 2009, 2009 Fifth International Conference on Natural Computation.

[21]  T. Murakami,et al.  A Stabilization Control of Bilateral System With Time Delay by Vibration Index—Application to Inverted Pendulum Control , 2009, IEEE Transactions on Industrial Electronics.

[22]  Sun Zhiyi,et al.  Application of multistage fuzzy control to a double inverted pendulum , 2009, 2009 IEEE International Conference on Control and Automation.

[24]  J. Vascák,et al.  Fuzzy Cognitive Maps in Path Planning , 2008 .

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Andrew G. Alleyne,et al.  Robust wireless servo control using a discrete-time uncertain Markovian jump linear model , 2007, ACC.

[27]  Ying-Chung Wang,et al.  Direct adaptive iterative learning control of nonlinear systems using an output-recurrent fuzzy neural network , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  M. Bialko,et al.  Training of artificial neural networks using differential evolution algorithm , 2008, 2008 Conference on Human System Interactions.