Adaptive Subgradient Methods for Online AUC Maximization

Learning for maximizing AUC performance is an important research problem in Machine Learning and Artificial Intelligence. Unlike traditional batch learning methods for maximizing AUC which often suffer from poor scalability, recent years have witnessed some emerging studies that attempt to maximize AUC by single-pass online learning approaches. Despite their encouraging results reported, the existing online AUC maximization algorithms often adopt simple online gradient descent approaches that fail to exploit the geometrical knowledge of the data observed during the online learning process, and thus could suffer from relatively larger regret. To address the above limitation, in this work, we explore a novel algorithm of Adaptive Online AUC Maximization (AdaOAM) which employs an adaptive gradient method that exploits the knowledge of historical gradients to perform more informative online learning. The new adaptive updating strategy of the AdaOAM is less sensitive to the parameter settings and maintains the same time complexity as previous non-adaptive counterparts. Additionally, we extend the algorithm to handle high-dimensional sparse data (SAdaOAM) and address sparsity in the solution by performing lazy gradient updating. We analyze the theoretical bounds and evaluate their empirical performance on various types of data sets. The encouraging empirical results obtained clearly highlighted the effectiveness and efficiency of the proposed algorithms.

[1]  Dennis McLeod,et al.  Spam Email Classification using an Adaptive Ontology , 2007, J. Softw..

[2]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[3]  Ninghui Li,et al.  Using probabilistic generative models for ranking risks of Android apps , 2012, CCS.

[4]  Rong Jin,et al.  Double Updating Online Learning , 2011, J. Mach. Learn. Res..

[5]  Steven C. H. Hoi,et al.  LIBOL: a library for online learning algorithms , 2014, J. Mach. Learn. Res..

[6]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[7]  Christoforos Panayiotou,et al.  SNAP: Fault Tolerant Event Location Estimation in Sensor Networks Using Binary Data , 2009, IEEE Transactions on Computers.

[8]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[9]  Cynthia Rudin,et al.  Margin-based Ranking and an Equivalence between AdaBoost and RankBoost , 2009, J. Mach. Learn. Res..

[10]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[11]  Chris Mesterharm,et al.  Active learning using on-line algorithms , 2011, KDD.

[12]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[13]  Steven C. H. Hoi,et al.  Cost-Sensitive Online Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[14]  Rong Jin,et al.  Online Multiple Kernel Classification , 2013, Machine Learning.

[15]  Szymon Jaroszewicz,et al.  Efficient AUC Optimization for Classification , 2007, PKDD.

[16]  Zhi-Hua Zhou,et al.  On the Consistency of AUC Pairwise Optimization , 2012, IJCAI.

[17]  R. L. Bradshaw,et al.  RESULTS AND ANALYSIS. , 1971 .

[18]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[19]  Prateek Jain,et al.  On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions , 2013, ICML.

[20]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[21]  Koby Crammer,et al.  New Adaptive Algorithms for Online Classification , 2010, NIPS.

[22]  Thorsten Joachims,et al.  KDD-Cup 2004: results and analysis , 2004, SKDD.

[23]  Rong Jin,et al.  Online AUC Maximization , 2011, ICML.

[24]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[25]  Eyke Hüllermeier,et al.  Bipartite Ranking through Minimization of Univariate Loss , 2011, ICML.

[26]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[27]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[28]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[29]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[30]  Calton Pu,et al.  Evolutionary study of web spam: Webb Spam Corpus 2011 versus Webb Spam Corpus 2006 , 2012, 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[31]  Zhi-Hua Zhou,et al.  On the consistency of AUC Optimization , 2012, ArXiv.

[32]  Steven C. H. Hoi,et al.  Cost-sensitive online active learning with application to malicious URL detection , 2013, KDD.

[33]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[34]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[35]  Zhi-Hua Zhou,et al.  One-Pass AUC Optimization , 2013, ICML.

[36]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[37]  Steven C. H. Hoi,et al.  Cost-Sensitive Double Updating Online Learning and Its Application to Online Anomaly Detection , 2013, SDM.

[38]  Ivor W. Tsang,et al.  Efficient Optimization of Performance Measures by Classifier Adaptation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[40]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[41]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[42]  Bin Li,et al.  Online Transfer Learning , 2014, Artif. Intell..