Online Multiclass Learning with "Bandit" Feedback under a Confidence-Weighted Approach

Data volume has been increasing explosively in recent years and learning methods are vitally important to extract key information in such mass data. Traditional offline learning requires multiple traversals to the dataset, thus frequently suffering from lack of computational resources. Online learning can benefit in shrinking total time consumed by training model and lowering computational capacity. However they often converge slowly due to memory loss. Considering partial feedback, we uniquely propose online Confidence- Weighted learning in Bandit setting (CWB) for lower cumulative error and higher convergence rate. Specifically, historical information is preserved to adjust the weights of features for speeding up the convergence rate. Moreover, we novelly integrate the random sampling into the confidence-weighted learning, which can balance the exploitation and exploration in bandit setting. Extensive experiments demonstrate efficiency and effectiveness of our proposed scheme.

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  Emmanuel Daucé,et al.  Online multiclass learning with "bandit" feedback under a Passive-Aggressive approach , 2015, ESANN.

[3]  Ambuj Tewari,et al.  Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[4]  Koby Crammer,et al.  Multiclass classification with bandit feedback using adaptive regularization , 2012, Machine Learning.

[5]  Koby Crammer,et al.  Multi-Class Confidence Weighted Algorithms , 2009, EMNLP.

[6]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[7]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[8]  J. Abernethy,et al.  An Efficient Bandit Algorithm for √ T-Regret in Online Multiclass Prediction ? , 2009 .

[9]  Xiaoying Gan,et al.  A Novel Sensing Scheme for Dynamic Multichannel Access , 2012, IEEE Transactions on Vehicular Technology.

[10]  Xiaoying Gan,et al.  Data Gathering with Compressive Sensing in Wireless Sensor Networks: A Random Walk Based Approach , 2015, IEEE Transactions on Parallel and Distributed Systems.

[11]  Claudio Gentile,et al.  Robust bounds for classification via selective sampling , 2009, ICML '09.

[12]  Steven C. H. Hoi,et al.  LIBOL: a library for online learning algorithms , 2014, J. Mach. Learn. Res..

[13]  Jacob D. Abernethy,et al.  An Efficient Bandit Algorithm for sqrt(T) Regret in Online Multiclass Prediction? , 2009, COLT.

[14]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[15]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[16]  Elad Hazan,et al.  Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction , 2011, NIPS.