Bayesian Optimization in Variational Latent Spaces with Dynamic Compression

Data-efficiency is crucial for autonomous robots to adapt to new tasks and environments. In this work we focus on robotics problems with a budget of only 10-20 trials. This is a very challenging setting even for data-efficient approaches like Bayesian optimization (BO), especially when optimizing higher-dimensional controllers. Simulated trajectories can be used to construct informed kernels for BO. However, previous work employed supervised ways of extracting low-dimensional features for these. We propose a model and architecture for a sequential variational autoencoder that embeds the space of simulated trajectories into a lower-dimensional space of latent paths in an unsupervised way. We further compress the search space for BO by reducing exploration in parts of the state space that are undesirable, without requiring explicit constraints on controller parameters. We validate our approach with hardware experiments on a Daisy hexapod robot and an ABB Yumi manipulator. We also present simulation experiments with further comparisons to several baselines on Daisy and two manipulators. Our experiments indicate the proposed trajectory-based kernel with dynamic compression can offer ultra data-efficient optimization.

[1]  Gaurav S. Sukhatme,et al.  Zero-Shot Skill Composition and Simulation-to-Real Transfer by Learning Task Representations , 2018, ArXiv.

[2]  Manuel Lopes,et al.  Active learning of visual descriptors for grasping using non-parametric smoothed beta distributions , 2012, Robotics Auton. Syst..

[3]  Danica Kragic,et al.  Global Search with Bernoulli Alternation Kernel for Task-oriented Grasping Informed by Simulation , 2018, CoRL.

[4]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[5]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[6]  Alan Fern,et al.  Using trajectory data to improve bayesian optimization for reinforcement learning , 2014, J. Mach. Learn. Res..

[7]  Roberto Calandra,et al.  Bayesian Modeling for Optimization and Control in Robotics , 2017 .

[8]  Carlos Riquelme,et al.  Failure Modes of Variational Inference for Decision Making , 2018 .

[9]  Howie Choset,et al.  Using response surfaces and expected improvement to optimize snake robot gait parameters , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Stephan Mandt,et al.  Disentangled Sequential Autoencoder , 2018, ICML.

[11]  Oliver Kroemer,et al.  Combining active learning and reactive control for robot grasping , 2010, Robotics Auton. Syst..

[12]  Nitish Thatte,et al.  A Method for Online Optimization of Lower Limb Assistive Devices with High Dimensional Parameter Spaces , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[14]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[15]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[16]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Ole Winther,et al.  A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning , 2017, NIPS.

[18]  OpenAI Learning Dexterous In-Hand Manipulation. , 2018 .

[19]  Christopher G. Atkeson,et al.  Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[20]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[21]  Katja Hofmann,et al.  Variational Inference for Data-Efficient Model Learning in POMDPs , 2018, ArXiv.

[22]  Franziska Meier,et al.  Using Simulation to Improve Sample-Efficiency of Bayesian Optimization for Bipedal Robots , 2018, J. Mach. Learn. Res..

[23]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[24]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[25]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[26]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[27]  Richard Socher,et al.  Quasi-Recurrent Neural Networks , 2016, ICLR.

[28]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[29]  Christopher G. Atkeson,et al.  Optimization‐based Full Body Control for the DARPA Robotics Challenge , 2015, J. Field Robotics.

[30]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[31]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[32]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[33]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[34]  Jessy W. Grizzle,et al.  Feedback Control of a Cassie Bipedal Robot: Walking, Standing, and Riding a Segway , 2018, 2019 American Control Conference (ACC).

[35]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[36]  Auke Jan Ijspeert,et al.  Online Optimization of Swimming and Crawling in an Amphibious Snake Robot , 2008, IEEE Transactions on Robotics.

[37]  Stefan Schaal,et al.  Online movement adaptation based on previous sensor experiences , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[38]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[39]  Danica Kragic,et al.  VPE: Variational Policy Embedding for Transfer Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[40]  Carl E. Rasmussen,et al.  Manifold Gaussian Processes for regression , 2014, 2016 International Joint Conference on Neural Networks (IJCNN).