论文信息 - The Body is Not a Given: Joint Agent Policy Learning and Morphology Evolution

The Body is Not a Given: Joint Agent Policy Learning and Morphology Evolution

Reinforcement learning (RL) has proven to be a powerful paradigm for deriving complex behaviors from simple reward signals in a wide range of environments. When applying RL to continuous control agents in simulated physics environments, the body is usually considered to be part of the environment. However, during evolution the physical body of biological organisms and their controlling brains are co-evolved, thus exploring a much larger space of actuator/controller configurations. Put differently, the intelligence does not reside only in the agent's mind, but also in the design of their body. We propose a method for uncovering strong agents, consisting of a good combination of a body and policy, based on combining RL with an evolutionary procedure. Given the resulting agent, we also propose an approach for identifying the body changes that contributed the most to the agent performance. We use the Shapley value from cooperative game theory to find the fair contribution of individual components, taking into account synergies between components. We evaluate our methods in an environment similar to the the recently proposed Robo-Sumo task, where agents in a software physics simulator compete in tipping over their opponent or pushing them out of the arena. Our results show that the proposed methods are indeed capable of generating strong agents, significantly outperforming baselines that focus on optimizing the agent policy alone. A video is available at: https://youtu.be/CHlecRim9PI

[1] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[2] L. S. Shapley,et al. 17. A Value for n-Person Games , 1953 .

[3] G. Simpson. THE BALDWIN EFFECT , 1953 .

[4] P. Dubey. On the uniqueness of the Shapley value , 1975 .

[5] A. Elo. The rating of chessplayers, past and present , 1978 .

[6] A. Roth,et al. The Shapley—Shubik and Banzhaf power indices as probabilities , 1988 .

[7] G. Jantzen. 1988 , 1988, The Winning Cars of the Indianapolis 500.

[8] Faruk Gul. Bargaining Foundations of Shapley Value , 1989 .

[9] V. Feltkamp. Alternative axiomatic characterizations of the Shapley and Banzhaf values , 1995 .

[10] Karl Sims,et al. Evolving virtual creatures , 1994, SIGGRAPH.

[11] Irmengard Rauch. 1994 , 1994, Semiotica.

[12] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[13] L. Darrell Whitley,et al. Lamarckian Evolution, The Baldwin Effect and Function Optimization , 1994, PPSN.

[14] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[15] M. Gribaudo,et al. 2002 , 2001, Cell and Tissue Research.

[16] B. Weber,et al. Evolution and Learning: The Baldwin Effect Reconsidered , 2003 .

[17] Shlomo Zilberstein,et al. Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[18] Eytan Ruppin,et al. Feature Selection Based on the Shapley Value , 2005, IJCAI.

[19] Ely Porat,et al. Power and stability in connectivity games , 2008, AAMAS.

[20] Evangelos Markakis,et al. Approximating power indices: theoretical and empirical analysis , 2010, Autonomous Agents and Multi-Agent Systems.

[21] Ely Porat,et al. Path disruption games , 2010, AAMAS.

[22] Moshe Tennenholtz,et al. Solving Cooperative Reliability Games , 2011, UAI.

[23] Hod Lipson,et al. Automatic Design and Manufacture of Soft Robots , 2012, IEEE Transactions on Robotics.

[24] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25] Borys Wróbel,et al. Co-evolution of morphology and control of soft-bodied multicellular animats , 2012, GECCO '12.

[26] Risto Miikkulainen,et al. Open-ended behavioral complexity for evolved virtual creatures , 2013, GECCO '13.

[27] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[28] Hod Lipson,et al. Unshackling evolution , 2014 .

[29] Pushmeet Kohli,et al. The cost of principles: analyzing power in compatibility weighted voting games , 2014, AAMAS.

[30] Dan Lessin and Don Fussell and Risto Miikkulainen,et al. Adopting Morphology to Multiple Tasks in Evolved Virtual Creatures , 2014, ALIFE.

[31] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[32] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[33] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[34] Jianfeng Gao,et al. Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[35] Hod Lipson,et al. On the Difficulty of Co-Optimizing Morphology and Control in Evolved Virtual Creatures , 2016, ALIFE.

[36] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.