Actor-critic-based ink drop spread as an intelligent controller

This paper introduces an innovative adaptive controller based on the actor-critic method. The proposed approach employs the ink drop spread (IDS) method as its main engine. The IDS method is a new trend in soft-computing approaches that is a universal fuzzy modeling technique and has been also used as a supervised controller. Its process is very similar to the processing system of the human brain. The proposed actor-critic method uses an IDS structure as an actor and a 2-dimensional plane, representing control variable states, as a critic that estimates the lifetime goodness of each state. This method is fast, simple, and away from mathematical complexity. The proposed method uses the temporal differences (TD) method to update both the actor and the critic. Our system: 1) learns to produce real-valued control actions in a continuous space without regarding the Markov decision process, 2) can adaptively improve performance during the lifetime, and 3) can scale well to high-dimensional problems. To show the effectiveness of the method, we conduct experiments on 3 systems: an inverted pendulum, a ball and beam, and a 2-wheel balancing robot. In each of these systems, the method converges to a pertinent fuzzy system with a significant improvement in terms of the rise time and overshoot compared to other fuzzy controllers.

[1]  Azeddine Draou,et al.  A Variable Gain PI Controller Used for Speed Control of a Direct Torque Neuro Fuzzy Controlled Induction Machine Drive , 2007 .

[2]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[3]  Nakaji Honda,et al.  Simulation of brain learning process through a novel fuzzy hardware approach , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[4]  Xuesong Wang,et al.  A fuzzy Actor-Critic reinforcement learning network , 2007, Inf. Sci..

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  G. Feng,et al.  A Survey on Analysis and Design of Model-Based Fuzzy Control Systems , 2006, IEEE Transactions on Fuzzy Systems.

[7]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[8]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[10]  Hamid R. Berenji,et al.  Learning and tuning fuzzy logic controllers through reinforcements , 1992, IEEE Trans. Neural Networks.

[11]  Masayuki Murakami,et al.  A study on the modeling ability of the IDS method: A soft computing technique using pattern-based information processing , 2007, Int. J. Approx. Reason..

[12]  M. Schaale,et al.  Application of the active learning method to the retrieval of pigment from spectral remote sensing reflectance data , 2009 .

[13]  P. Kokotovic,et al.  Nonlinear control via approximate input-output linearization: the ball and beam example , 1992 .

[14]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[15]  Alfred C. Rufer,et al.  JOE: a mobile, inverted pendulum , 2002, IEEE Trans. Ind. Electron..

[16]  M.A. Wiering,et al.  Two Novel On-policy Reinforcement Learning Algorithms based on TD(λ)-methods , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.