Multiclass classification with bandit feedback using adaptive regularization

We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit indicating whether the predicted label is correct or not, rather than the true label. Our algorithm is based on the second-order Perceptron, and uses upper-confidence bounds to trade-off exploration and exploitation, instead of random sampling as performed by most current algorithms. We analyze this algorithm in a partial adversarial setting, where instances are chosen adversarially, while the labels are chosen according to a linear probabilistic model which is also chosen adversarially. We show a regret of $\mathcal{O}(\sqrt{T}\log T)$, which improves over the current best bounds of $\mathcal{O}(T^{2/3})$ in the fully adversarial setting. We evaluate our algorithm on nine real-world text classification problems and on four vowel recognition tasks, often obtaining state-of-the-art results, even compared with non-bandit online algorithms, especially when label noise is introduced.

[1]  Hui Lin,et al.  How to loose confidence: probabilistic linear machines for multiclass classification , 2009, INTERSPEECH.

[2]  Philippe Rigollet,et al.  Nonparametric Bandits with Covariates , 2010, COLT.

[3]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[4]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[5]  Claudio Gentile,et al.  Robust bounds for classification via selective sampling , 2009, ICML '09.

[6]  Rong Jin,et al.  A Potential-based Framework for Online Multi-class Learning with Partial Feedback , 2010, AISTATS.

[7]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[8]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[9]  Michael L. Littman,et al.  Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.

[10]  Richard Wright,et al.  The vocal joystick data collection effort and vowel corpus , 2006, INTERSPEECH.

[11]  Koby Crammer,et al.  Multi-Class Confidence Weighted Algorithms , 2009, EMNLP.

[12]  Thomas J. Walsh,et al.  Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.

[13]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[14]  Ambuj Tewari,et al.  Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[15]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[16]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[17]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[18]  Ambuj Tewari,et al.  On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[19]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[20]  Elad Hazan,et al.  Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction , 2011, NIPS.

[21]  Rong Jin,et al.  Learning to trade off between exploration and exploitation in multiclass bandit prediction , 2011, KDD.

[22]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[23]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[24]  Jun Liu,et al.  Efficient Euclidean projections in linear time , 2009, ICML '09.

[25]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[26]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[27]  Tyler Lu,et al.  Showing Relevant Ads via Lipschitz Context Multi-Armed Bandits , 2010 .

[28]  H. Vincent Poor,et al.  Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.

[29]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..