论文信息 - Sampling Efficiency in Learning Robot Motion

Sampling Efficiency in Learning Robot Motion

Policy Search (PS) algorithms are nowadays widely used for their simplicity and effectiveness in finding solutions for robotic problems. However, most current PS algorithms derive policies by statistically fitting the data from the best experiments only. This means that those experiments yielding a poor performance are usually discarded or given too little influence on the policy update. In this chapter, we propose a generalization of the Relative Entropy Policy Search (REPS) algorithm that takes bad experiences into consideration when computing a policy. The proposed approach, named Dual REPS (DREPS) [1], following the philosophical interpretation of the duality between good and bad, finds clusters of experimental data yielding a poor behavior and adds them to the optimization problem as a repulsive constraint. Thus, considering there is a duality between good and bad data samples, both are taken into account in the stochastic search for a policy. Additionally, a cluster with the best samples may be included as an attractor to enforce faster convergence to a single optimal solution in multimodal problems.

Carme Torras | Adrià Colomé | C. Torras | Adrià Colomé

[1] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[2] Carme Torras,et al. Robot motion adaptation through user intervention and reinforcement learning , 2017, Pattern Recognit. Lett..

[3] Jun Nakanishi,et al. Learning Movement Primitives , 2005, ISRR.

[4] Vicenç Gómez,et al. Policy Search for Path Integral Control , 2014, ECML/PKDD.

[5] Carme Torras,et al. User Evaluation of an Interactive Learning Framework for Single-Arm and Dual-Arm Robots , 2016, ICSR.

[6] Shehroz S. Khan,et al. Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[7] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[8] Gerhard Neumann,et al. Variational Inference for Policy Search in changing situations , 2011, ICML.

[9] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .

[10] Carme Torras,et al. Dual REPS: A Generalization of Relative Entropy Policy Search Exploiting Bad Experiences , 2017, IEEE Transactions on Robotics.

[11] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[12] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.