论文信息 - Learning Robust Policies for Object Manipulation with Robot Swarms

Learning Robust Policies for Object Manipulation with Robot Swarms

Swarm robotics investigates how a large population of robots with simple actuation and limited sensors can collectively solve complex tasks. One particular interesting application with robot swarms is autonomous object assembly. Such tasks have been solved successfully with robot swarms that are controlled by a human operator using a light source. In this paper, we present a method to solve such assembly tasks autonomously based on policy search methods. We split the assembly process in two subtasks: generating a high-level assembly plan and learning a low-level object movement policy. The assembly policy plans the trajectories for each object and the object movement policy controls the trajectory execution. Learning the object movement policy is challenging as it depends on the complex state of the swarm which consists of an individual state for each agent. To approach this problem, we introduce a representation of the swarm which is based on Hilbert space embeddings of distributions. This representation is invariant to the number of agents in the swarm as well as to the allocation of an agent to its position in the swarm. These invariances make the learned policy robust to changes in the swarm and also reduce the search space for the policy search method significantly. We show that the resulting system is able to solve assembly tasks with varying object shapes in multiple simulation scenarios and evaluate the robustness of our representation to changes in the swarm size. Furthermore, we demonstrate that the policies learned in simulation are robust enough to be transferred to real robots.

[1] Kunikazu Kobayashi,et al. A reinforcement learning system for swarm behaviors , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[2] Sylvain Martel,et al. Using a swarm of self-propelled natural microrobots in the form of flagellated bacteria to perform complex micro-assembly tasks , 2010, 2010 IEEE International Conference on Robotics and Automation.

[3] Radhika Nagpal,et al. Collective transport of complex objects by simple robots: theory and experiments , 2013, AAMAS.

[4] Masashi Furukawa,et al. An actor-critic approach for learning cooperative behaviors of multiagent seesaw balancing problems , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[5] Lynne E. Parker,et al. Multiple Mobile Robot Systems , 2008, Springer Handbook of Robotics.

[6] Marius Schnaubelt,et al. Learning to Assemble Objects with a Robot Swarm , 2017, AAMAS.

[7] Zoubin Ghahramani,et al. Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[8] Johannes Fürnkranz,et al. Model-Free Preference-Based Reinforcement Learning , 2016, AAAI.

[9] Alcherio Martinoli,et al. Multi-robot learning with particle swarm optimization , 2006, AAMAS '06.

[10] Le Song,et al. A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[11] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .

[12] Yoram Koren,et al. Potential field methods and their inherent limitations for mobile robot navigation , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[13] Karol Myszkowski,et al. Adaptive Logarithmic Mapping For Displaying High Contrast Scenes , 2003, Comput. Graph. Forum.

[14] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.

[15] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[16] Ai Poh Loh,et al. Model-based contextual policy search for data-efficient generalization of robot skills , 2017, Artif. Intell..

[17] Aaron Becker,et al. Object manipulation and position control using a swarm with global inputs , 2016, 2016 IEEE International Conference on Automation Science and Engineering (CASE).

[18] Alcherio Martinoli,et al. Parallel learning in heterogeneous multi-robot swarms , 2007, 2007 IEEE Congress on Evolutionary Computation.

[19] O. Khatib,et al. Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[20] Radhika Nagpal,et al. Kilobot: A low cost robot with scalable operations designed for collective behaviors , 2014, Robotics Auton. Syst..

[21] Radhika Nagpal,et al. Kilobot: A low cost scalable robot system for collective behaviors , 2012, 2012 IEEE International Conference on Robotics and Automation.

[22] Robert L. Stevenson,et al. Dynamic range improvement through multiple exposures , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[23] Jan Peters,et al. Non-parametric Policy Search with Limited Information Loss , 2017, J. Mach. Learn. Res..

[24] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[25] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[26] Matthew W. Hoffman,et al. Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization , 2011, EWRL.

[27] Michael Rubenstein,et al. Massive uniform manipulation: Controlling large populations of simple robots with a common input signal , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.