Online Multiclass Boosting with Bandit Feedback

We present online boosting algorithms for multiclass classification with bandit feedback, where the learner only receives feedback about the correctness of its prediction. We propose an unbiased estimate of the loss using a randomized prediction, allowing the model to update its weak learners with limited information. Using the unbiased estimate, we extend two full information boosting algorithms (Jung et al., 2017) to the bandit setting. We prove that the asymptotic error bounds of the bandit algorithms exactly match their full information counterparts. The cost of restricted feedback is reflected in the larger sample complexity. Experimental results also support our theoretical findings, and performance of the proposed models is comparable to that of an existing bandit boosting algorithm, which is limited to use binary weak learners.

[1]  Haipeng Luo,et al.  Optimal and Adaptive Algorithms for Online Boosting , 2015, ICML.

[2]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[3]  Haipeng Luo,et al.  Logistic Regression: The Importance of Being Improper , 2018, COLT.

[4]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[6]  Hsuan-Tien Lin,et al.  Boosting with Online Binary Learners for the Multiclass Bandit Problem , 2014, ICML.

[7]  Ambuj Tewari,et al.  Online Boosting Algorithms for Multi-label Ranking , 2017, AISTATS.

[8]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[9]  Shai Ben-David,et al.  Multiclass Learnability and the ERM principle , 2011, COLT.

[10]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[11]  Ambuj Tewari,et al.  Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[12]  Hugo Fuks,et al.  Wearable Computing: Accelerometers' Data Classification of Body Postures and Movements , 2012, SBIA.

[13]  Ambuj Tewari,et al.  Online multiclass boosting , 2017, NIPS.

[14]  K. Cios,et al.  Self-Organizing Feature Maps Identify Proteins Critical to Learning in a Mouse Model of Down Syndrome , 2015, PloS one.

[15]  Francesco Orabona,et al.  Efficient Online Bandit Multiclass Learning with Õ(√T) Regret , 2017, ICML.

[16]  Hsuan-Tien Lin,et al.  An Online Boosting Algorithm with Theoretical Justifications , 2012, ICML.

[17]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[18]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[19]  Robert E. Schapire,et al.  A theory of multiclass boosting , 2010, J. Mach. Learn. Res..