Using Machine Learning to Improve Stochastic Optimization

In many stochastic optimization algorithms there is a hyperparameter that controls how the next sampling distribution is determined from the current data set of samples of the objective function. This hyperparameter controls the exploration /exploitation trade-off of the next sample. Typically heuristic "rules of thumb" are used to set that hyperparameter, e.g., a pre-fixed annealing schedule. We show how machine learning provides more principled alternatives to (adaptively) set that hyperparameter, and demonstrate that these alternatives can substantially improve optimization performance.

[1]  R. Rubinstein A Stochastic Minimum Cross-Entropy Method for Combinatorial Optimization and Rare-event Estimation* , 2005 .

[2]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[3]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[4]  Bertrand Clarke,et al.  Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored , 2003, J. Mach. Learn. Res..

[5]  Xu Ye,et al.  Advances in estimation of distribution algorithms , 2012 .

[6]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[7]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[8]  Joseph Sill,et al.  Feature-Weighted Linear Stacking , 2009, ArXiv.

[9]  Dirk P. Kroese,et al.  The Cross-Entropy Method for Continuous Multi-Extremal Optimization , 2006 .

[10]  Dirk P. Kroese,et al.  Cross‐Entropy Method , 2011 .

[11]  David H. Wolpert,et al.  Probability Collectives in Optimization , 2013 .

[12]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[13]  Michèle Sebag,et al.  Extending Population-Based Incremental Learning to Continuous Search Spaces , 1998, PPSN.

[14]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[15]  Yi Hong,et al.  The convergence analysis and specification of the Population-Based Incremental Learning algorithm , 2011, Neurocomputing.

[16]  Pedro Larrañaga,et al.  Towards a New Evolutionary Computation - Advances in the Estimation of Distribution Algorithms , 2006, Towards a New Evolutionary Computation.

[17]  J. A. Lozano,et al.  Towards a New Evolutionary Computation: Advances on Estimation of Distribution Algorithms (Studies in Fuzziness and Soft Computing) , 2006 .

[18]  Yuri Ermoliev,et al.  Monte Carlo Optimization and Path Dependent Nonstationary Laws of Large Numbers , 1998 .

[19]  David H. Wolpert,et al.  Bias-Variance Techniques for Monte Carlo Optimization: Cross-validation for the CE Method , 2008, ArXiv.

[20]  Dirk P. Kroese,et al.  Combinatorial Optimization via Cross-Entropy , 2004 .

[21]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[22]  David H. Wolpert,et al.  Advances in Distributed Optimization Using Probability Collectives , 2006, Adv. Complex Syst..

[23]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.