Budgeted Passive-Aggressive Learning for Online Multiclass Classification

Online multiclass classification is a specific problem of online learning that performs a sequence of multiclass classification tasks given the knowledge of previous tasks. The goal is to make correct predictions for this sequence. It is generally considered a more complicated problem than its binary counterpart, online binary classification. A popular algorithm, called the passive-aggressive algorithm, was primarily proposed for binary problems and later extended as the multiclass passive-aggressive (MPA) algorithm for multiclass problems. The nature of MPA allows itself to implement the kernel trick, which enables us to make better predictions with a kernel-based model. However, this approach suffers from the curse of kernelization that causes unbounded growth of the model in memory usage and runtime. To solve the growth problem, we first introduce a resource perspective that gives an alternative and equivalent interpretation of the kernel-based MPA algorithm. Based on the resource perspective, we propose the budgeted MPA (BMPA) algorithm, which approximates the kernel-based MPA algorithm. BMPA limits the maximum number of available resources by removal and fully exploits them through a constrained optimization. We study three removal strategies and give a relative mistake bound that provides a unified analysis. Simulation experiments on various datasets are conducted to demonstrate that BMPA is effective and competitive with state-of-the-art budgeted online algorithms.

[1]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[2]  Steven C. H. Hoi,et al.  Online Sparse Passive Aggressive Learning with Kernels , 2016, SDM.

[3]  Hiroshi Nakagawa,et al.  Exact Passive-Aggressive Algorithm for Multiclass Classification Using Support Class , 2010, SDM.

[4]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[5]  Steven C. H. Hoi,et al.  Online Passive Aggressive Active Learning and Its Applications , 2014, ACML.

[6]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[7]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[8]  Alberto Cano,et al.  Kappa Updated Ensemble for drifting data stream mining , 2019, Machine Learning.

[9]  Slobodan Vucetic,et al.  Online Passive-Aggressive Algorithms on a Budget , 2010, AISTATS.

[10]  Rong Jin,et al.  Online Multiple Kernel Classification , 2013, Machine Learning.

[11]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[12]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[13]  Steven C. H. Hoi,et al.  Soft Confidence-Weighted Learning , 2016, ACM Trans. Intell. Syst. Technol..

[14]  Leandro dos Santos Coelho,et al.  Earthworm optimisation algorithm: a bio-inspired metaheuristic algorithm for global optimisation problems , 2018, Int. J. Bio Inspired Comput..

[15]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.

[16]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[17]  Hsueh-Ming Hang,et al.  Online multiclass passive-aggressive learning on a fixed budget , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[18]  Jason Weston,et al.  Online (and Offline) on an Even Tighter Budget , 2005, AISTATS.

[19]  Rong Jin,et al.  Double Updating Online Learning , 2011, J. Mach. Learn. Res..

[20]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[21]  Steven C. H. Hoi,et al.  LIBOL: a library for online learning algorithms , 2014, J. Mach. Learn. Res..

[22]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[23]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[24]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[25]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[26]  Barbara Caputo,et al.  Bounded Kernel-Based Online Learning , 2009, J. Mach. Learn. Res..

[27]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[28]  Koby Crammer,et al.  Multi-Class Pegasos on a Budget , 2010, ICML.

[29]  Slobodan Vucetic,et al.  Twin Vector Machines for Online Learning on a Budget , 2009, SDM.

[30]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[31]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[32]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[33]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[34]  Hsuan-Tien Lin,et al.  A Simple Unlearning Framework for Online Learning Under Concept Drifts , 2016, PAKDD.

[35]  Naonori Ueda,et al.  Online Passive-Aggressive Algorithms for Non-Negative Matrix Factorization and Completion , 2014, AISTATS.

[36]  Steven C. H. Hoi,et al.  Large Scale Online Kernel Learning , 2016, J. Mach. Learn. Res..

[37]  Gaige Wang,et al.  Moth search algorithm: a bio-inspired metaheuristic algorithm for global optimization problems , 2016, Memetic Computing.

[38]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[39]  Zhihua Cui,et al.  Monarch butterfly optimization , 2015, Neural Computing and Applications.

[40]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[41]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[42]  Steven C. H. Hoi,et al.  Large Scale Online Kernel Classification , 2013, IJCAI.

[43]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[44]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[45]  Steven C. H. Hoi,et al.  Online Learning: A Comprehensive Survey , 2018, Neurocomputing.

[46]  Koby Crammer,et al.  Multi-Class Confidence Weighted Algorithms , 2009, EMNLP.

[47]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[48]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  S. Deb,et al.  Elephant Herding Optimization , 2015, 2015 3rd International Symposium on Computational and Business Intelligence (ISCBI).

[50]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[51]  Steven C. H. Hoi,et al.  PAMR: Passive aggressive mean reversion strategy for portfolio selection , 2012, Machine Learning.

[52]  Koby Crammer,et al.  Confidence-Weighted Linear Classification for Text Categorization , 2012, J. Mach. Learn. Res..

[53]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[54]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[56]  Claudio Gentile,et al.  A New Approximate Maximal Margin Classification Algorithm , 2002, J. Mach. Learn. Res..