AutoBandit: A Meta Bandit Online Learning System

Recently online multi-armed bandit (MAB) is growing rapidly, as novel problem settings and algorithms motivated by various practical applications are being studied, building on the top of the classic bandit problem. However, identifying the best bandit algorithm from many potential candidates for a given application is not only timeconsuming but also relying on human expertise, which hinders the practicality of MAB. To alleviate this problem, this paper outlines an intelligent system called AUTOBANDIT, equipped with many out-of-the-box MAB algorithms, for automatically and adaptively choosing the best with suitable hyper parameters online. It is effective to help a growing application for continuously maximizing cumulative rewards of its whole life-cycle. With a flexible architecture and user-friendly web-based interfaces, it is very convenient for the user to integrate and monitor online bandits in a business system. At the time of publication, AUTOBANDIT has been deployed for various industrial applications.

[1]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[2]  Xiaoyan Zhu,et al.  Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation , 2014, SDM.

[3]  John Langford,et al.  Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback , 2019, ICML.

[4]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[5]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[6]  Kilian Q. Weinberger,et al.  Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 , 2016 .

[7]  Thorsten Joachims,et al.  Multi-armed Bandit Problems with History , 2012, AISTATS.

[8]  Yu Zhang,et al.  Transferable Contextual Bandit for Cross-Domain Recommendation , 2018, AAAI.

[9]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[10]  Xi Chen,et al.  Online EXP3 Learning in Adversarial Bandits with Delayed Feedback , 2019, NeurIPS.

[11]  Bin Yu,et al.  Artificial intelligence and statistics , 2018, Frontiers of Information Technology & Electronic Engineering.

[12]  Lingda Wang,et al.  A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits , 2019, AAAI.

[13]  Djallel Bouneffouf,et al.  A Survey on Practical Applications of Multi-Armed and Contextual Bandits , 2019, ArXiv.

[14]  Rong Jin,et al.  A Practical Semi-Parametric Contextual Bandit , 2019, IJCAI.

[15]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.