论文信息 - Multiclass classification with bandit feedback using adaptive regularization

Multiclass classification with bandit feedback using adaptive regularization

We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit indicating whether the predicted label is correct or not, rather than the true label. Our algorithm is based on the second-order Perceptron, and uses upper-confidence bounds to trade-off exploration and exploitation, instead of random sampling as performed by most current algorithms. We analyze this algorithm in a partial adversarial setting, where instances are chosen adversarially, while the labels are chosen according to a linear probabilistic model which is also chosen adversarially. We show a regret of $\mathcal{O}(\sqrt{T}\log T)$, which improves over the current best bounds of $\mathcal{O}(T^{2/3})$ in the fully adversarial setting. We evaluate our algorithm on nine real-world text classification problems and on four vowel recognition tasks, often obtaining state-of-the-art results, even compared with non-bandit online algorithms, especially when label noise is introduced.

Koby Crammer | Claudio Gentile

[1] Hui Lin,et al. How to loose confidence: probabilistic linear machines for multiclass classification , 2009, INTERSPEECH.

[2] Philippe Rigollet,et al. Nonparametric Bandits with Covariates , 2010, COLT.

[3] Koby Crammer,et al. On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[4] John Blitzer,et al. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[5] Claudio Gentile,et al. Robust bounds for classification via selective sampling , 2009, ICML '09.

[6] Rong Jin,et al. A Potential-based Framework for Online Multi-class Learning with Partial Feedback , 2010, AISTATS.

[7] Arthur E. Hoerl,et al. Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[8] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[9] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.

[10] Richard Wright,et al. The vocal joystick data collection effort and vowel corpus , 2006, INTERSPEECH.

[11] Koby Crammer,et al. Multi-Class Confidence Weighted Algorithms , 2009, EMNLP.