The Cross-Entropy Method Optimizes for Quantiles

Cross-entropy optimization (CE) has proven to be a powerful tool for search in control environments. In the basic scheme, a distribution over proposed solutions is repeatedly adapted by evaluating a sample of solutions and refocusing the distribution on a percentage of those with the highest scores. We show that, in the kind of noisy evaluation environments that are common in decision-making domains, this percentage-based refocusing does not optimize the expected utility of solutions, but instead a quantile metric. We provide a variant of CE (Proportional CE) that effectively optimizes the expected value. We show using variants of established noisy environments that Proportional CE can be used in place of CE and can improve solution quality.

[1]  Dimitri P. Bertsekas,et al.  Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .

[2]  Jiaqiao Hu,et al.  Simulation-based Algorithms for Markov Decision Processes (Communications and Control Engineering) , 2007 .

[3]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[4]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[5]  Michael D. Vose,et al.  The simple genetic algorithm - foundations and theory , 1999, Complex adaptive systems.

[6]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[7]  Shie Mannor,et al.  The cross entropy method for classification , 2005, ICML.

[8]  Peter Stone,et al.  An empirical analysis of value function-based and policy search reinforcement learning , 2009, AAMAS.

[9]  András Lörincz,et al.  Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[10]  J. Fitzpatrick,et al.  Genetic Algorithms in Noisy Environments , 2005, Machine Learning.

[11]  Dirk P. Kroese,et al.  Convergence properties of the cross-entropy method for discrete optimization , 2007, Oper. Res. Lett..

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  S. Ioffe,et al.  Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .

[14]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[15]  Dirk P. Kroese,et al.  Application of the Cross-Entropy Method to the Buffer Allocation Problem in a Simulation-Based Environment , 2005, Ann. Oper. Res..

[16]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[17]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[18]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[19]  Reuven Y. Rubinstein,et al.  Optimization of computer simulation models with rare events , 1997 .

[20]  Steven I. Marcus,et al.  Simulation-based Algorithms for Markov Decision Processes/ Hyeong Soo Chang ... [et al.] , 2013 .

[21]  L. Margolin,et al.  On the Convergence of the Cross-Entropy Method , 2005, Ann. Oper. Res..

[22]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[23]  David H. Ackley,et al.  The effects of selection on noisy fitness optimization , 2011, GECCO '11.