A Sample-Efficient Black-Box Optimizer to Train Policies for Human-in-the-Loop Systems With User Preferences

We present a new algorithm for optimizing control policies for human-in-the-loop systems based on qualitative preference feedback. This method is especially applicable to systems such as lower limb prostheses and exoskeletons for which it is difficult to define an objective function, hard to identify a model, and costly to repeat hardware experiments. To solve these problems, we combine and extend an algorithm for learning from preferences and the Predictive Entropy Search Bayesian optimization method. The resulting algorithm, Predictive Entropy Search with Preferences (PES-P), solicits preferences between pairs of control parameter sets that optimally reduce the uncertainty in the distribution of objective function optima with the least number of experiments. We find that this algorithm outperforms the expected improvement method (EI), and random comparisons via Latin hypercubes (LH) in three simulation tests that range from optimizing randomly generated functions to tuning control parameters of linear systems and of a walking model. Furthermore, we find in a pilot study on the control of a robotic transfemoral prosthesis that PES-P finds good control parameters quickly and more consistently than EI or LH given real user preferences. The results suggest the proposed algorithm can help engineers optimize certain robotic systems more accurately, efficiently, and consistently.

[1]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[2]  Howie Choset,et al.  Using response surfaces and expected improvement to optimize snake robot gait parameters , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Nitish Thatte,et al.  Toward Balance Recovery With Leg Prostheses Using Neuromuscular Model Control , 2016, IEEE Transactions on Biomedical Engineering.

[4]  Anind K. Dey,et al.  Human Behavior Modeling with Maximum Entropy Inverse Optimal Control , 2009, AAAI Spring Symposium: Human Behavior Modeling.

[5]  Thorsten Joachims,et al.  Learning Trajectory Preferences for Manipulators via Iterative Improvement , 2013, NIPS.

[6]  Carl E. Rasmussen,et al.  Derivative Observations in Gaussian Process Models of Dynamic Systems , 2002, NIPS.

[7]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[8]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[9]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[10]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[11]  Farbod Fahimi,et al.  Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.

[12]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[13]  Jan Peters,et al.  Bayesian Gait Optimization for Bipedal Locomotion , 2014, LION.

[14]  Michèle Sebag,et al.  Programming by Feedback , 2014, ICML.

[15]  Alan Fern,et al.  A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.

[16]  Nando de Freitas,et al.  Active Preference Learning with Discrete Choice Data , 2007, NIPS.

[17]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[18]  X. Rong Li,et al.  Estimation and filtering of Gaussian variables with linear inequality constraints , 2010, 2010 13th International Conference on Information Fusion.

[19]  Zoubin Ghahramani,et al.  Collaborative Gaussian Processes for Preference Learning , 2012, NIPS.

[20]  Jie Yan,et al.  A Review of Gait Optimization Based on Evolutionary Computation , 2010, Appl. Comput. Intell. Soft Comput..

[21]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[22]  Wei Chu,et al.  Preference learning with Gaussian processes , 2005, ICML.

[23]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[24]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[25]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[26]  Cynthia Breazeal,et al.  Training a Robot via Human Feedback: A Case Study , 2013, ICSR.

[27]  Nando de Freitas,et al.  A Bayesian interactive optimization approach to procedural animation design , 2010, SCA '10.