Developmental Bayesian Optimization of Black-Box with Visual Similarity-Based Transfer Learning

We present a developmental framework based on a long-term memory and reasoning mechanisms (Vision Similarity and Bayesian Optimisation). This architecture allows a robot to optimize autonomously hyper-parameters that need to be tuned from any action and/or vision module, treated as a black-box. The learning can take advantage of past experiences (stored in the episodic and procedural memories) in order to warm-start the exploration using a set of hyper-parameters previously optimized from objects similar to the new unknown one (stored in a semantic memory). As example, the system has been used to optimized 9 continuous hyper-parameters of a professional software (Kamido) both in simulation and with a real robot (industrial robotic arm Fanuc) with a total of 13 different objects. The robot is able to find a good object-specific optimization in 68 (simulation) or 40 (real) trials. In simulation, we demonstrate the benefit of the transfer learning based on visual similarity, as opposed to an amnesic learning (i.e. learning from scratch all the time). Moreover, with the real robot, we show that the method consistently outperforms the manual optimization from an expert with less than 2 hours of training time to achieve more than 88% of success.

[1]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[2]  Tomoaki Nakamura,et al.  Symbol emergence in robotics: a survey , 2015, Adv. Robotics.

[3]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[4]  Bernd Bischl,et al.  Model-Based Multi-objective Optimization: Taxonomy, Multi-Point Proposal, Toolbox and Benchmark , 2015, EMO.

[5]  M. Stein Large sample properties of simulations using latin hypercube sampling , 1987 .

[6]  Alexandre Bernardino,et al.  Unscented Bayesian optimization for safe robot grasping , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Eric Lengyel Volumetric Hierarchical Approximate Convex Decomposition , 2016 .

[8]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[9]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[10]  Yiannis Demiris,et al.  Towards the emergence of procedural memories from lifelong multi-modal streaming memories for cognitive robots , 2016 .

[11]  Victor Picheny,et al.  Quantile-Based Optimization of Noisy Computer Experiments With Tunable Precision , 2013, Technometrics.

[12]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[13]  Roland Siegwart,et al.  Flexible Robotic Grasping with Sim-to-Real Transfer based Reinforcement Learning , 2018, ArXiv.

[14]  Bernd Bischl,et al.  mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions , 2017, 1703.03373.

[15]  Tony Belpaeme,et al.  A review of long-term memory in natural and synthetic systems , 2012, Adapt. Behav..

[16]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[17]  Bernd Bischl,et al.  Multi-objective parameter configuration of machine learning algorithms using model-based optimization , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[18]  Guillaume Gibert,et al.  Emergence of the use of pronouns and names in triadic human-robot spoken interaction , 2014, 4th International Conference on Development and Learning and on Epigenetic Robotics.

[19]  Tom Schaul,et al.  Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[20]  Nikolaus Hansen,et al.  Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[21]  E. Balint Memory and consciousness. , 1987, The International journal of psycho-analysis.

[22]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[24]  Peter Ford Dominey,et al.  Successive Developmental Levels of Autobiographical Memory for Learning Through Social Interaction , 2014, IEEE Transactions on Autonomous Mental Development.

[25]  Roland Siegwart,et al.  Comparing Task Simplifications to Learn Closed-Loop Object Picking Using Deep Reinforcement Learning , 2018, IEEE Robotics and Automation Letters.

[26]  Jan Peters,et al.  Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.

[27]  Jonas Mockus,et al.  Application of Bayesian approach to numerical methods of global and stochastic optimization , 1994, J. Glob. Optim..

[28]  A. Forrester,et al.  Design and analysis of 'noisy' computer experiments , 2006 .

[29]  Andreas Krause,et al.  Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with Bayesian optimization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Giulio Sandini,et al.  Prospection in Cognition: The Case for Joint Episodic-Procedural Memory in Cognitive Robotics , 2015, Front. Robot. AI.

[31]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[32]  J. Mockus The Bayesian Approach to Local Optimization , 1989 .

[33]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yiannis Demiris,et al.  Lifelong Augmentation of Multimodal Streaming Autobiographical Memories , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[35]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[36]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[37]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[38]  Sergey Levine,et al.  Learning Flexible and Reusable Locomotion Primitives for a Microrobot , 2018, IEEE Robotics and Automation Letters.

[39]  Yiannis Demiris,et al.  Hierarchical action learning by instruction through interactive grounding of body parts and proto-actions , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.