Soft Confidence-Weighted Learning

Online learning plays an important role in many big data mining problems because of its high efficiency and scalability. In the literature, many online learning algorithms using gradient information have been applied to solve online classification problems. Recently, more effective second-order algorithms have been proposed, where the correlation between the features is utilized to improve the learning efficiency. Among them, Confidence-Weighted (CW) learning algorithms are very effective, which assume that the classification model is drawn from a Gaussian distribution, which enables the model to be effectively updated with the second-order information of the data stream. Despite being studied actively, these CW algorithms cannot handle nonseparable datasets and noisy datasets very well. In this article, we propose a family of Soft Confidence-Weighted (SCW) learning algorithms for both binary classification and multiclass classification tasks, which is the first family of online classification algorithms that enjoys four salient properties simultaneously: (1) large margin training, (2) confidence weighting, (3) capability to handle nonseparable data, and (4) adaptive margin. Our experimental results show that the proposed SCW algorithms significantly outperform the original CW algorithm. When comparing with a variety of state-of-the-art algorithms (including AROW, NAROW, and NHERD), we found that SCW in general achieves better or at least comparable predictive performance, but enjoys considerably better efficiency advantage (i.e., using a smaller number of updates and lower time cost). To facilitate future research, we release all the datasets and source code to the public at http://libol.stevenhoi.org/.

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  Rong Jin,et al.  Online Multiple Kernel Classification , 2013, Machine Learning.

[3]  Steven C. H. Hoi,et al.  Cost-Sensitive Online Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[4]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[5]  Koby Crammer,et al.  Multi-domain learning by confidence-weighted parameter combination , 2010, Machine Learning.

[6]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[7]  Steven C. H. Hoi,et al.  Second Order Online Collaborative Filtering , 2013, ACML.

[8]  Rong Jin,et al.  Online AUC Maximization , 2011, ICML.

[9]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[10]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[11]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[12]  Steven C. H. Hoi,et al.  PAMR: Passive aggressive mean reversion strategy for portfolio selection , 2012, Machine Learning.

[13]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[14]  Claudio Gentile,et al.  A New Approximate Maximal Margin Classification Algorithm , 2002, J. Mach. Learn. Res..

[15]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[16]  Steven C. H. Hoi,et al.  Online multi-task collaborative filtering for on-the-fly recommender systems , 2013, RecSys.

[17]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[18]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[19]  Ji Wan,et al.  SOLAR: Scalable Online Learning Algorithms for Ranking , 2015, ACL.

[20]  Yoram Singer,et al.  Online multiclass learning by interclass hypothesis sharing , 2006, ICML.

[21]  Bin Li,et al.  Online Transfer Learning , 2014, Artif. Intell..

[22]  Steven C. H. Hoi,et al.  Cost-sensitive online active learning with application to malicious URL detection , 2013, KDD.

[23]  Koby Crammer,et al.  Multi-Class Confidence Weighted Algorithms , 2009, EMNLP.

[24]  Koby Crammer,et al.  New Adaptive Algorithms for Online Classification , 2010, NIPS.

[25]  Yoram Singer,et al.  A Unified Algorithmic Approach for Efficient Online Label Ranking , 2007, AISTATS.

[26]  Jieping Ye,et al.  Online learning by ellipsoid method , 2009, ICML '09.

[27]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[28]  Steven C. H. Hoi,et al.  Online portfolio selection: A survey , 2012, CSUR.

[29]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[30]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[31]  Koby Crammer,et al.  Loss Bounds for Online Category Ranking , 2005, COLT.

[32]  Koby Crammer,et al.  Learning via Gaussian Herding , 2010, NIPS.

[33]  Yi Li,et al.  The Relaxed Online Maximum Margin Algorithm , 1999, Machine Learning.

[34]  Steven C. H. Hoi,et al.  LIBOL: a library for online learning algorithms , 2014, J. Mach. Learn. Res..

[35]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[36]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[37]  Rong Jin,et al.  Online Multiple Kernel Similarity Learning for Visual Search , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Rong Jin,et al.  Double Updating Online Learning , 2011, J. Mach. Learn. Res..

[39]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[40]  Koby Crammer,et al.  Exact Convex Confidence-Weighted Learning , 2008, NIPS.

[41]  Chunyan Miao,et al.  Online multimodal deep similarity learning with application to image retrieval , 2013, ACM Multimedia.