论文信息 - The Multi-Armed Bandit Problem under Delayed Rewards Conditions in Digital Campaign Management

The Multi-Armed Bandit Problem under Delayed Rewards Conditions in Digital Campaign Management

In this paper, we account for a digital marketing content recommendation system, called campaign management, used by marketers to create specific digital content that can be issued or configured for viewing by certain population segments according to a series of business variables, user profile or behavior. We analyze the most representative allocation strategies to deal with the multi-armed bandit problem in a context with delayed rewards by means of a numerical study based on a discrete event simulation. Both batch mode and online update architectures are considered for feedback from the different contents displayed to users.

Antonio Jiménez-Martín | Alfonso Mateos | Miguel Martín

[1] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[2] Andreas Krause,et al. Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[3] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.

[4] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[5] András György,et al. Online Learning under Delayed Feedback , 2013, ICML.

[6] Miguel Martín,et al. Possibilistic reward methods for the multi-armed bandit problem , 2018, Neurocomputing.

[7] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[8] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[9] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[10] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[11] Akimichi Takemura,et al. An Asymptotically Optimal Bandit Algorithm for Bounded Support Models. , 2010, COLT 2010.