Modelling inter-task relations to transfer robot skills with three-way RBMs

Transfer on Reinforcement Learning (RL) is a promising method on learning new skills and adapting new situations for humanoid robots as tasks or environments change. Within the normal process of Transfer Learning in RL, the inter-task mapping is manually defined, which lacks generalization ability. Therefore, how to automatically learning intertask relations becomes a hot topic. Considering the limited computational resource of a physical humanoid robot, high learning efficiency regarding to both fast speed algorithm and low sample complexity should be emphasized in skills transfer. According to this view, in this research, the inter-task relations are modelled using a three-way Restricted Boltzmann Machine (RBM), which is turned out to be a powerful model in capturing the similarity between samples from source task and target task. Since standard Contrastive Divergence (CD) algorithm commonly used for RBM learning suffers from the inputindependent problem and may lead the learning process timeconsuming or inapplicable, a Cyclic Contrastive Divergence (CCD) learning algorithm is employed. In order to evaluate the performance, experiment that transfer the skill of walking on flat surface to the skill walking on slope surface is conducted on our physical robot platform, PKU-HR5.1, and the result indicates that the method is feasible and efficient.

[1]  Yi Wang,et al.  Active online learning of the bipedal walking , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[2]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3]  R. E. Carlson,et al.  Monotone Piecewise Cubic Interpolation , 1980 .

[4]  Sani Irwan Md Salim,et al.  3-D Biped Robot Walking along Slope with Dual Length Linear Inverted Pendulum Method (DLLIPM) , 2013 .

[5]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[6]  Xihong Wu,et al.  Online learning of COM trajectory for humanoid robot locomotion , 2012, 2012 IEEE International Conference on Mechatronics and Automation.

[7]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[8]  Yi Wang,et al.  A Cyclic Contrastive Divergence Learning Algorithm for High-Order RBMs , 2014, 2014 13th International Conference on Machine Learning and Applications.

[9]  Geoffrey E. Hinton,et al.  Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[10]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[11]  Haitham Bou-Ammar,et al.  Reinforcement learning transfer via sparse coding , 2012, AAMAS.

[12]  Haitham Bou-Ammar,et al.  Automatically Mapped Transfer between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines , 2013, ECML/PKDD.

[13]  Geoffrey E. Hinton,et al.  Modeling the joint density of two images under a variety of transformations , 2011, CVPR 2011.

[14]  Roland Memisevic,et al.  Learning to Relate Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..