论文信息 - Towards optimization of a human-inspired heuristic for solving explore-exploit problems

Towards optimization of a human-inspired heuristic for solving explore-exploit problems

Motivated by models of human decision making, we consider a heuristic solution for explore-exploit problems. In a numerical example we show that, with appropriate parameter values, the algorithm performs well. However, the parameters of the algorithm trade off exploration against exploitation in a complicated way so that finding the optimal parameter values is not obvious. We show that the optimal parameter values can be analytically computed in some cases and prove that suboptimal parameter tunings can provide robustness to modeling error. The analytic results suggest a feedback control law for dynamically optimizing parameters.

[1] Han-Lim Choi,et al. A multi-UAV targeting algorithm for ensemble forecast improvement , 2007 .

[2] Warren B. Powell,et al. “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[3] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[4] Andrea Nedic. Models for Individual Decision-Making with Social Feedback , 2011 .

[5] Han-Lim Choi,et al. Adaptive sampling and forecasting with mobile sensor networks , 2009 .

[6] Sailes K. Sengijpta. Fundamentals of Statistical Signal Processing: Estimation Theory , 1995 .

[7] Andrew M. Saxe,et al. Acquisition of decision making criteria: reward rate ultimately beats accuracy , 2011, Attention, perception & psychophysics.

[8] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[9] Warren B. Powell,et al. Approximate Dynamic Programming I: Modeling , 2011 .

[10] Jonathan D. Cohen,et al. An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. , 2005, Annual review of neuroscience.

[12] Angela J. Yu,et al. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[13] H. Vincent Poor,et al. An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[14] H. Vincent Poor,et al. An introduction to signal detection and estimation (2nd ed.) , 1994 .

[15] Naomi Ehrich Leonard,et al. Collective Motion, Sensor Networks, and Ocean Sampling , 2007, Proceedings of the IEEE.