A regret-based approach to non-stationary discrete stochastic optimization

This paper presents regret-based adaptive search algorithms for non-stationary simulation-based discrete optimization problems. The proposed algorithm relies on a potential-based sampling strategy that prescribes how to successively take samples from the search space. Performance analysis shows that the randomly evolving set of global optimizers can be properly tracked by showing that the worst case regret can be kept arbitrarily small infinitely often. The proposed scheme allows temporal correlation of the simulation data, and assumes no functional properties on the objective function. Numerical examples show that performance gains are obtained in both convergence speed and efficiency as compared with random search, simulated annealing, and pure exploration methods.

[1]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[2]  Alexander Shapiro,et al.  The empirical behavior of sampling methods for stochastic programming , 2006, Ann. Oper. Res..

[3]  Andrew R. Teel,et al.  Weak Converse Lyapunov Theorems and Control-Lyapunov Functions , 2003, SIAM J. Control. Optim..

[4]  Debasish Chatterjee,et al.  On Stability of Randomly Switched Nonlinear Systems , 2007, IEEE Transactions on Automatic Control.

[5]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[6]  Mahmoud H. Alrefaei,et al.  Discrete stochastic optimization using variants of the stochastic ruler method , 2005 .

[7]  Christopher M. Kellett,et al.  A compendium of comparison function results , 2014, Math. Control. Signals Syst..

[8]  Sigrún Andradóttir,et al.  A Global Search Method for Discrete Stochastic Optimization , 1996, SIAM J. Optim..

[9]  Zhe Wu,et al.  The Theory of Discrete Lagrange Multipliers for Nonlinear Discrete Optimization , 1999, CP.

[10]  Gang George Yin,et al.  Best-response search algorithms for non-stationary discrete stochastic optimization , 2014, 53rd IEEE Conference on Decision and Control.

[11]  Eric Moulines,et al.  On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.

[12]  Sigrún Andradóttir,et al.  Balanced Explorative and Exploitative Search with Estimation for Simulation Optimization , 2009, INFORMS J. Comput..

[13]  S. Andradóttir,et al.  A Simulated Annealing Algorithm with Constant Temperature for Discrete Stochastic Optimization , 1999 .

[14]  Gang George Yin,et al.  Adaptive Search Algorithms for Discrete Stochastic Optimization: A Smooth Best-Response Approach , 2014, IEEE Transactions on Automatic Control.

[15]  Barry L. Nelson,et al.  Discrete Optimization via Simulation Using COMPASS , 2006, Oper. Res..

[16]  S. Hart Adaptive Heuristics , 2005 .

[17]  Jorge Nocedal,et al.  Sample size selection in optimization methods for machine learning , 2012, Math. Program..

[18]  S. Hart,et al.  A Reinforcement Procedure Leading to Correlated Equilibrium , 2001 .

[19]  Sigrún Andradóttir,et al.  Chapter 20 An Overview of Simulation Optimization via Random Search , 2006, Simulation.

[20]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[21]  Georg Ch. Pflug,et al.  A branch and bound method for stochastic global optimization , 1998, Math. Program..

[22]  Fred W. Glover,et al.  Tabu Search , 1997, Handbook of Heuristics.

[23]  Leyuan Shi,et al.  Nested Partitions Method for Global Optimization , 2000, Oper. Res..

[24]  Gang George Yin,et al.  Regime Switching Stochastic Approximation Algorithms with Application to Adaptive Discrete Stochastic Optimization , 2004, SIAM J. Optim..