An Adaptive Moment estimation method for online AUC maximization

Area Under the ROC Curve (AUC) is a widely used metric for measuring classification performance. It has important theoretical and academic values to develop AUC maximization algorithms. Traditional methods often apply batch learning algorithm to maximize AUC which is inefficient and unscalable for large-scale applications. Recently some online learning algorithms have been introduced to maximize AUC by going through the data only once. However, these methods sometimes fail to converge to an optimal solution due to the fixed or rapid decay of learning rates. To tackle this problem, we propose an algorithm AdmOAM, Adaptive Moment estimation method for Online AUC Maximization. It applies the estimation of moments of gradients to accelerate the convergence and mitigates the rapid decay of the learning rates. We establish the regret bound of the proposed algorithm and implement extensive experiments to demonstrate its effectiveness and efficiency.

[1]  Mahmoud Saeed,et al.  End-To-End Multi-Modal Sensors Fusion System For Urban Automated Driving , 2018 .

[2]  Charles X. Ling,et al.  AUC: A Better Measure than Accuracy in Comparing Learning Algorithms , 2003, Canadian Conference on AI.

[3]  Jordan L. Boyd-Graber,et al.  Why ADAGRAD Fails for Online Topic Modeling , 2017, EMNLP.

[4]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Yi Ding,et al.  An Adaptive Gradient Method for Online AUC Maximization , 2015, AAAI.

[7]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[8]  Alain Rakotomamonjy,et al.  Optimizing Area Under Roc Curve with SVMs , 2004, ROCAI.

[9]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[10]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[11]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[13]  Bhavani Raskutti,et al.  Optimising area under the ROC curve using gradient descent , 2004, ICML.

[14]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[15]  Eyke Hüllermeier,et al.  Bipartite Ranking through Minimization of Univariate Loss , 2011, ICML.

[16]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[17]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[18]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[19]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[20]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.

[21]  Zhi-Hua Zhou,et al.  One-Pass AUC Optimization , 2013, ICML.

[22]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[23]  Zhi-Hua Zhou,et al.  On the Consistency of AUC Pairwise Optimization , 2012, IJCAI.

[24]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[25]  Prateek Jain,et al.  On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions , 2013, ICML.

[26]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[27]  Ulf Brefeld,et al.  {AUC} maximizing support vector learning , 2005 .

[28]  Ulf Brefeld,et al.  Co-EM support vector learning , 2004, ICML.

[29]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[30]  Bianca Zadrozny,et al.  Outlier detection by active learning , 2006, KDD '06.

[31]  Rong Jin,et al.  Online AUC Maximization , 2011, ICML.

[32]  Zhenchang Xing,et al.  Ensemble application of convolutional and recurrent neural networks for multi-label text categorization , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[33]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[34]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[35]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[36]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.