论文信息 - Resource Allocation in Multi-armed Bandit Exploration: Overcoming Nonlinear Scaling with Adaptive Parallelism - 字舞流文

Resource Allocation in Multi-armed Bandit Exploration: Overcoming Nonlinear Scaling with Adaptive Parallelism

We study exploration in stochastic multi-armed bandits when we have access to a divisible resource, and can allocate varying amounts of this resource to arm pulls. By allocating more resources to a pull, we can compute the outcome faster to inform subsequent decisions about which arms to pull. However, since distributed environments do not scale linearly, executing several arm pulls in parallel, and hence less resources per pull, may result in better throughput. For example, in simulation-based scientific studies, an expensive simulation can be sped up by running it on multiple cores. This speed-up is, however, partly offset by the communication among cores and overheads, which results in lower throughput than if fewer cores were allocated to run more trials in parallel. We explore these trade-offs in the fixed confidence setting, where we need to find the best arm with a given success probability, while minimizing the time to do so. We propose an algorithm which trades off between information accumulation and throughout and show that the time taken can be upper bounded by the solution of a dynamic program whose inputs are the squared gaps between the suboptimal and optimal arms. We prove a matching hardness result which demonstrates that the above dynamic program is fundamental to this problem. Next, we propose and analyze an algorithm for the fixed deadline setting, where we are given a time deadline and need to maximize the success probability of finding the best arm. We corroborate these theoretical insights with an empirical evaluation.

Michael I. Jordan | Kirthevasan Kandasamy | Ion Stoica | Brijen Thananjeyan | Joseph E. Gonzalez | Ken Goldberg

[1] Ameet Talwalkar,et al. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[2] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[3] Daniel Russo,et al. Simple Bayesian Algorithms for Best Arm Identification , 2016, COLT.

[4] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[5] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[6] Dominik D. Freydenberger,et al. Can We Learn to Gamble Efficiently? , 2010, COLT.

[7] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[8] W. M. Wood-Vasey,et al. Scrutinizing Exotic Cosmological Models Using ESSENCE Supernova Data Combined with Other Cosmological Probes , 2007, astro-ph/0701510.

[9] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[10] Mor Harchol-Balter,et al. heSRPT: Parallel Scheduling to Minimize Mean Slowdown , 2020, Perform. Evaluation.

[11] Matthew Malloy,et al. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[12] Andreas Krause,et al. Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[13] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008, Computer.

[14] Sébastien Bubeck,et al. Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[15] Kirthevasan Kandasamy,et al. High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[16] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[17] Zi Wang,et al. Batched High-dimensional Bayesian Optimization via Structural Kernel Learning , 2017, ICML.

[18] Koby Crammer,et al. A Better Resource Allocation Algorithm with Semi-Bandit Feedback , 2018, ALT.

[19] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[20] Ion Stoica,et al. HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline , 2019, SoCC.

[21] Orso Meneghini,et al. Automating kinetic equilibrium reconstruction for tokamak stability analysis , 2019 .

[22] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[23] Arun Rajkumar,et al. Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback , 2019, NeurIPS.

[24] Peter Stone,et al. Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.

[25] Koby Crammer,et al. Optimal Resource Allocation with Semi-Bandit Feedback , 2014, UAI.

[26] V. Shchigolev,et al. Calculating luminosity distance versus redshift in FLRW cosmology via homotopy perturbation method , 2015, 1511.07459.

[27] Robert D. Nowak,et al. Top Arm Identification in Multi-Armed Bandits with Batch Arm Pulls , 2016, AISTATS.

[28] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[29] Ion Stoica,et al. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[30] Benjamin C. Lee,et al. Amdahl's Law in the Datacenter Era: A Market for Fair Processor Allocation , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[31] Kirthevasan Kandasamy,et al. Parallelised Bayesian Optimisation via Thompson Sampling , 2018, AISTATS.

[32] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[33] Kirthevasan Kandasamy,et al. Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly , 2019, J. Mach. Learn. Res..

[34] Aaditya Ramdas,et al. Sequential estimation of quantiles with applications to A/B testing and best-arm identification , 2019, Bernoulli.

[35] Mor Harchol-Balter,et al. Towards Optimality in Parallel Job Scheduling , 2017, SIGMETRICS.

[36] Stefano Ermon,et al. Best arm identification in multi-armed bandits with delayed feedback , 2018, AISTATS.