Adapting control policies for expensive systems to changing environments

Many controlled systems must operate over a range of external conditions. In this paper, we focus on the problem of learning a policy to adapt a system's controller based on the value of these external conditions in order to always perform well (i.e., maximize system output). In addition, we are concerned with systems for which it is expensive to run experiments, and therefore restrict the number that can be run during training. We formally define the problem setup and the notion of an optimal control policy. We propose two algorithms which aim to find such a policy while minimizing the number of system output evaluations. We present results comparing these algorithms and various other approaches and discuss the inherent tradeoffs in the proposed algorithms. Finally, we use these methods to train both simulated and physical snake robots to automatically adapt to changing terrain, and demonstrate improved performance on test courses with changing environments.

[1]  Kenneth Holmström,et al.  An adaptive radial basis algorithm (ARBF) for expensive black-box global optimization , 2008, J. Glob. Optim..

[2]  D. Dennis,et al.  A statistical method for global optimization , 1992, [Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics.

[3]  Andrew W. Moore,et al.  Memory-based Stochastic Optimization , 1995, NIPS.

[4]  Howie Choset,et al.  Using response surfaces and expected improvement to optimize snake robot gait parameters , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Antanas Zilinskas,et al.  A review of statistical models for global optimization , 1992, J. Glob. Optim..

[6]  Howie Choset,et al.  Parameterized and Scripted Gaits for Modular Snake Robots , 2009, Adv. Robotics.

[7]  Christine A. Shoemaker,et al.  A Stochastic Radial Basis Function Method for the Global Optimization of Expensive Functions , 2007, INFORMS J. Comput..

[8]  Thomas J. Santner,et al.  Sequential design of computer experiments for robust parameter design , 2002 .

[9]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[10]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[11]  Trevor Darrell,et al.  Active Learning with Gaussian Processes for Object Categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[13]  Hans-Martin Gutmann,et al.  A Radial Basis Function Method for Global Optimization , 2001, J. Glob. Optim..

[14]  Eric Walter,et al.  Global optimization based on noisy evaluations: An empirical study of two statistical approaches , 2008 .

[15]  Howie Choset,et al.  Design of a modular snake robot , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[17]  M. Emmerich,et al.  The computation of the expected improvement in dominated hypervolume of Pareto front approximations , 2008 .

[18]  N. Zheng,et al.  Global Optimization of Stochastic Black-Box Systems via Sequential Kriging Meta-Models , 2006, J. Glob. Optim..

[19]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[20]  David Ginsbourger,et al.  Noisy Expected Improvement and on-line computation time allocation for the optimization of simulators with tunable fidelity , 2010 .

[21]  H. Zimmermann Towards global optimization 2: L.C.W. DIXON and G.P. SZEGÖ (eds.) North-Holland, Amsterdam, 1978, viii + 364 pages, US $ 44.50, Dfl. 100,-. , 1979 .

[22]  Thomas J. Santner,et al.  Sequential design of computer experiments to minimize integrated response functions , 2000 .

[23]  Joshua D. Knowles,et al.  ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems , 2006, IEEE Transactions on Evolutionary Computation.

[24]  George E. P. Box,et al.  Empirical Model‐Building and Response Surfaces , 1988 .