Jointly Learning to Construct and Control Agents using Deep Reinforcement Learning

The physical design of a robot and the policy that controls its motion are inherently coupled, and should be determined according to the task and environment. In an increasing number of applications, data-driven and learning-based approaches, such as deep reinforcement learning, have proven effective at designing control policies. For most tasks, the only way to evaluate a physical design with respect to such control policies is empirical—i.e., by picking a design and training a control policy for it. Since training these policies is time-consuming, it is computationally infeasible to train separate policies for all possible designs as a means to identify the best one. In this work, we address this limitation by introducing a method that jointly optimizes over the physical design and control network. Our approach maintains a distribution over designs and uses reinforcement learning to optimize a control policy to maximize expected reward over the design distribution. We give the controller access to design parameters to allow it to tailor its policy to each design in the distribution. Throughout training, we shift the distribution towards higher-performing designs, eventually converging to a design and control policy that are jointly optimal. We evaluate our approach in the context of legged locomotion, and demonstrate that it discovers novel designs and walking gaits, outperforming baselines across different settings.

[1]  R.E. Skelton,et al.  Integrated control/structure design for planar tensegrity models , 2002, Proceedings of the International Conference on Control Applications.

[2]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[3]  Matthew R. Walter,et al.  Jointly optimizing placement and inference for beacon-based localization , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[5]  Tad McGeer,et al.  Passive Dynamic Walking , 1990, Int. J. Robotics Res..

[6]  Markus H. Gross,et al.  Interactive design of 3D-printable robotic creatures , 2015, ACM Trans. Graph..

[7]  Sehoon Ha,et al.  Joint Optimization of Robot Design and Motion Parameters using the Implicit Function Theorem , 2017, Robotics: Science and Systems.

[8]  Martijn Wisse,et al.  A Three-Dimensional Passive-Dynamic Walking Robot with Two Legs and Knees , 2001, Int. J. Robotics Res..

[9]  Pawel Wawrzynski,et al.  Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.

[10]  Michiel van de Panne,et al.  Diverse motion variations for physics-based character animation , 2013, SCA '13.

[11]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[12]  Jahng-Hyon Park,et al.  Concurrent Design Optimization of Mechanical Structure and Control for High Speed Robots , 1993, 1993 American Control Conference.

[13]  Sergey Levine,et al.  Learning Complex Neural Network Policies with Trajectory Optimization , 2014, ICML.

[14]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[15]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[16]  Jürgen Schmidhuber,et al.  Training Recurrent Networks by Evolino , 2007, Neural Computation.

[17]  Scott Kuindersma,et al.  Optimization and stabilization of trajectories for constrained dynamical systems , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Chandana Paul,et al.  Design and control of tensegrity robots for locomotion , 2006, IEEE Transactions on Robotics.

[19]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[20]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[21]  Sham M. Kakade,et al.  Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[22]  Panos Y. Papalambros,et al.  Combined Optimal Design and Control With Application to an , 2002 .

[23]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[24]  Daniela Rus,et al.  Cogeneration of mechanical, electrical, and software designs for printable robots from structural specifications , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[26]  Julian Togelius,et al.  Neuroevolution in Games: State of the Art and Open Challenges , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[27]  Wojciech Matusik,et al.  Computational design of mechanical characters , 2013, ACM Trans. Graph..

[28]  David Wang,et al.  Simultaneous plant-controller design optimization of a two-link planar manipulator , 2006 .

[29]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[30]  Jordan B. Pollack,et al.  Automatic design and manufacture of robotic lifeforms , 2000, Nature.

[31]  Hadas Kress-Gazit,et al.  Robot Creation from Functional Specifications , 2015, ISRR.

[32]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[33]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[35]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[36]  Bernard Espiau,et al.  A Study of the Passive Gait of a Compass-Like Biped Robot , 1998, Int. J. Robotics Res..

[37]  Ajay Kumar Tanwani,et al.  Autonomous reinforcement learning with experience replay. , 2013, Neural networks : the official journal of the International Neural Network Society.

[38]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[40]  H. Harry Asada,et al.  Integrated structure/control design of mechatronic systems using a recursive experimental optimization method , 1996 .

[41]  Roland Siegwart,et al.  Concurrent Optimization of Mechanical Design and Locomotion Control of a Legged Robot , 2014 .

[42]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[43]  Karl Sims,et al.  Evolving virtual creatures , 1994, SIGGRAPH.

[44]  Josh Bongard,et al.  Morphological change in machines accelerates the evolution of robust behavior , 2011, Proceedings of the National Academy of Sciences.

[45]  Russ Tedrake,et al.  Planning robust walking motion on uneven terrain via convex optimization , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[46]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Zoran Popovic,et al.  Animal Locomotion Controllers From Scratch , 2013, Comput. Graph. Forum.

[48]  Emanuel Todorov,et al.  Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[49]  Jessy W. Grizzle,et al.  Nonholonomic virtual constraints and gait optimization for robust walking control , 2017, Int. J. Robotics Res..

[50]  Michiel van de Panne,et al.  Flexible muscle-based locomotion for bipedal creatures , 2013, ACM Trans. Graph..

[51]  Miguel G. Villarreal-Cervantes,et al.  Robust Structure-Control Design Approach for Mechatronic Systems , 2013, IEEE/ASME Transactions on Mechatronics.

[52]  Andrea Censi,et al.  A Class of Co-Design Problems With Cyclic Constraints and Their Solution , 2017, IEEE Robotics and Automation Letters.

[53]  Chandana Paul,et al.  The road less travelled: morphology in the optimization of biped robot locomotion , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[54]  Daniela Rus,et al.  Functional co-optimization of articulated robots , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[55]  Zoran Popovic,et al.  Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[56]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[57]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[58]  Ayan Chakrabarti,et al.  Learning Sensor Multiplexing Design through Back-propagation , 2016, NIPS.

[59]  Zoran Popovic,et al.  Interactive Control of Diverse Complex Characters with Neural Networks , 2015, NIPS.

[60]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.