Pure Exploration of Multi-Armed Bandits with Heavy-Tailed Payoffs

Inspired by heavy-tailed distributions in practical scenarios, we investigate the problem on pure exploration of Multi-Armed Bandits (MAB) with heavy-tailed payoffs by breaking the assumption of payoffs with sub-Gaussian noises in MAB, and assuming that stochastic payoffs from bandits are with finite p-th moments, where p ∈ (1,+∞). The main contributions in this paper are three-fold. First, we technically analyze tail probabilities of empirical average and truncated empirical average (TEA) for estimating expected payoffs in sequential decisions with heavy-tailed noises via martingales. Second, we propose two effective bandit algorithms based on different prior information (i.e., fixed confidence or fixed budget) for pure exploration of MAB generating payoffs with finite p-th moments. Third, we derive theoretical guarantees for the proposed two bandit algorithms, and demonstrate the effectiveness of two algorithms in pure exploration of MAB with heavy-tailed payoffs in synthetic data and real-world financial data.

[1]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[2]  Michal Valko,et al.  Extreme bandits , 2014, NIPS.

[3]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[4]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[5]  Tor Lattimore,et al.  A Scale Free Algorithm for Stochastic Bandits with Bounded Kurtosis , 2017, NIPS.

[6]  Florin Ciucu,et al.  Delay Bounds in Communication Networks With Heavy-Tailed and Self-Similar Traffic , 2009, IEEE Transactions on Information Theory.

[7]  Wei Chen,et al.  Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[8]  Quanquan Gu,et al.  Contextual Bandits in a Collaborative Environment , 2016, SIGIR.

[9]  John Shawe-Taylor,et al.  PAC-Bayesian Inequalities for Martingales , 2011, IEEE Transactions on Information Theory.

[10]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[11]  Hanieh Panahi,et al.  Model Selection Test for the Heavy-Tailed Distributions under Censored Samples with Application in Financial Data , 2016 .

[12]  Nicolò Cesa-Bianchi,et al.  Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.

[13]  Andres Muñoz Medina,et al.  No-Regret Algorithms for Heavy-Tailed Linear Bandits , 2016, ICML.

[14]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[15]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[16]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[17]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[18]  Sattar Vakili,et al.  Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems , 2011, IEEE Journal of Selected Topics in Signal Processing.

[19]  S. Dharmadhikari,et al.  Bounds on the Moments of Martingales , 1968 .

[20]  Michael R. Lyu,et al.  CBRAP: Contextual Bandits with RAndom Projection , 2017, AAAI.

[21]  Peter L. Bartlett,et al.  Improved Learning Complexity in Combinatorial Pure Exploration Bandits , 2016, AISTATS.

[22]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[23]  Tong Zhao,et al.  Locality-Sensitive Linear Bandit Model for Online Social Recommendation , 2016, ICONIP.

[24]  Sébastien Bubeck,et al.  Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[25]  Koby Crammer,et al.  Linear Multi-Resource Allocation with Semi-Bandit Feedback , 2015, NIPS.

[26]  Robert D. Nowak,et al.  Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting , 2014, 2014 48th Annual Conference on Information Sciences and Systems (CISS).

[27]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[28]  Shuai Li,et al.  Collaborative Filtering Bandits , 2015, SIGIR.

[29]  Stefano Ermon,et al.  Adaptive Concentration Inequalities for Sequential Decision Problems , 2016, NIPS.

[30]  Shuai Li,et al.  Distributed Clustering of Linear Bandits in Peer to Peer Networks , 2016, ICML.

[31]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[32]  Michael R. Lyu,et al.  Risk Control of Best Arm Identification in Multi-armed Bandits via Successive Rejects , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[33]  H. Robbins Some aspects of the sequential design of experiments , 1952 .