论文信息 - A Hybrid Learning Strategy for Real Hardware of Swing-Up Pendulum

A Hybrid Learning Strategy for Real Hardware of Swing-Up Pendulum

Generally, the bottom-up learning approaches, such as neural-network, to obtain the optimal controller of target task for mechanical system face a problem including huge number of trials, which require much time and give stress against the hardware. To avoid such problems, a simulator is often built and performed with a learning method. However, there are also problems that how simulator is constructed and how accurate it performs. In this paper, we are considering a construction of simulator directly from the real hardware. Afterward a constructed simulator is used for learning target task and the obtained optimal controller is applied to the real hardware. As an example, we picked up the pendulum swing-up task which was a typical nonlinear control problem. The construction of a simulator is performed by backpropagation method with neural-network and the optimal controller is obtained by reinforcement learning method. Both processes are implemented without using the real hardware after the data sampling, therefore, load against the hardware gets sufficiently smaller, and the objective controller can be obtained faster than using only the hardware. And we consider that our proposed method can be a basic learning strategy to obtain the optimal controller of mechanical systems.

Shuji Hashimoto | Ryo Saegusa | Shingo Nakamura

[1] Kazunobu Yoshida,et al. Swing-up control of an inverted pendulum by energy-based methods , 1999, Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251).

[2] Kenji Doya,et al. Efficient Nonlinear Control with Actor-Tutor Architecture , 1996, NIPS.

[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4] Katsuhisa Furuta,et al. Swinging up a pendulum by energy control , 1996, Autom..

[5] Masami Iwase,et al. Time Optimal Swing-Up Control of Single Pendulum , 2001 .

[6] Shingo Nakamura,et al. Crossing the reality gap for a swing-up pendulum , 2006 .

[7] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[8] Mitsuo Kawato,et al. Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[9] Keiichiro Hoashi,et al. Humanoid Robots in Waseda University—Hadaly-2 and WABIAN , 2002, Auton. Robots.

[10] Ryo Saegusa,et al. Nonlinear principal component analysis to preserve the order of principal components , 2003, Neurocomputing.

[11] M. Bugeja,et al. Non-linear swing-up and stabilizing control of an inverted pendulum system , 2003, The IEEE Region 8 EUROCON 2003. Computer as a Tool..

[12] Shuji Hashimoto,et al. A learning strategy using simulator for real hardware of swing-up pendulum , 2006 .