Adaptive Cost-Sensitive Online Classification

Cost-Sensitive Online Classification has drawn extensive attention in recent years, where the main approach is to directly online optimize two well-known cost-sensitive metrics: (i) weighted sum of sensitivity and specificity and (ii) weighted misclassification cost. However, previous existing methods only considered first-order information of data stream. It is insufficient in practice, since many recent studies have proved that incorporating second-order information enhances the prediction performance of classification models. Thus, we propose a family of cost-sensitive online classification algorithms with adaptive regularization in this paper. We theoretically analyze the proposed algorithms and empirically validate their effectiveness and properties in extensive experiments. Then, for better trade off between the performance and efficiency, we further introduce the sketching technique into our algorithms, which significantly accelerates the computational speed with quite slight performance loss. Finally, we apply our algorithms to tackle several online anomaly detection tasks from real world. Promising results prove that the proposed algorithms are effective and efficient in solving cost-sensitive online classification problems in various real-world domains.

[1]  Zhi-Hua Zhou,et al.  The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study , 2006, Sixth International Conference on Data Mining (ICDM'06).

[2]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[3]  David P. Woodruff,et al.  Frequent Directions: Simple and Deterministic Matrix Sketching , 2015, SIAM J. Comput..

[4]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[5]  Dimitris Achlioptas,et al.  Fast computation of low-rank matrix approximations , 2007, JACM.

[6]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[7]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[8]  Edo Liberty,et al.  Simple and deterministic matrix sketching , 2012, KDD.

[9]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[10]  Claudio Gentile,et al.  A New Approximate Maximal Margin Classification Algorithm , 2002, J. Mach. Learn. Res..

[11]  Bin Li,et al.  Confidence Weighted Mean Reversion Strategy for Online Portfolio Selection , 2011, TKDD.

[12]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[13]  Haipeng Luo,et al.  Efficient Second Order Online Learning by Sketching , 2016, NIPS.

[14]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[15]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[16]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[17]  Ambuj Tewari,et al.  Optimal Stragies and Minimax Lower Bounds for Online Convex Games , 2008, COLT.

[18]  Steven C. H. Hoi,et al.  Cost-Sensitive Online Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[19]  Santosh S. Vempala,et al.  The Random Projection Method , 2005, DIMACS Series in Discrete Mathematics and Theoretical Computer Science.

[20]  Steven C. H. Hoi,et al.  LIBOL: a library for online learning algorithms , 2014, J. Mach. Learn. Res..

[21]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[22]  Chunyan Miao,et al.  High-Dimensional Data Stream Classification via Sparse Online Learning , 2014, 2014 IEEE International Conference on Data Mining.

[23]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[24]  Yi Li,et al.  The Relaxed Online Maximum Margin Algorithm , 1999, Machine Learning.

[25]  Per-Gunnar Martinsson,et al.  Randomized algorithms for the low-rank approximation of matrices , 2007, Proceedings of the National Academy of Sciences.

[26]  Rong Jin,et al.  Double Updating Online Learning , 2011, J. Mach. Learn. Res..

[27]  Moritz Hardt,et al.  The Noisy Power Method: A Meta Algorithm with Applications , 2013, NIPS.

[28]  Xindong Wu,et al.  Class Noise Handling for Effective Cost-Sensitive Learning by Cost-Guided Iterative Classification Filtering , 2006, IEEE Transactions on Knowledge and Data Engineering.

[29]  Yifan Zhang,et al.  Strategy-updating depending on local environment enhances cooperation in prisoner's dilemma game , 2017, Appl. Math. Comput..

[30]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[31]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[32]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[33]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[34]  Tong Zhang,et al.  Projection-free Distributed Online Learning in Networks , 2017, ICML.

[35]  Qingyao Wu,et al.  Online Transfer Learning with Multiple Homogeneous or Heterogeneous Sources , 2017, IEEE Transactions on Knowledge and Data Engineering.

[36]  Xindong Wu,et al.  Online Learning from Trapezoidal Data Streams , 2016, IEEE Transactions on Knowledge and Data Engineering.

[37]  Daniel M. Kane,et al.  Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.

[38]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[39]  Qingyao Wu,et al.  Online transfer learning by leveraging multiple source domains , 2017, Knowledge and Information Systems.

[40]  Steven C. H. Hoi,et al.  Cost-Sensitive Double Updating Online Learning and Its Application to Online Anomaly Detection , 2013, SDM.

[41]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[42]  Steven C. H. Hoi,et al.  Cost-sensitive online active learning with application to malicious URL detection , 2013, KDD.

[43]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[44]  Min Wu,et al.  Cost-Sensitive Online Classification with Adaptive Regularization and Its Applications , 2015, 2015 IEEE International Conference on Data Mining.

[45]  Steven C. H. Hoi,et al.  OTL: A Framework of Online Transfer Learning , 2010, ICML.

[46]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[47]  Zhi-Hua Zhou,et al.  Cost-Sensitive Semi-Supervised Support Vector Machine , 2010, AAAI.

[48]  Ivor W. Tsang,et al.  Online Heterogeneous Transfer by Hedge Ensemble of Offline and Online Decisions , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[49]  Qingyao Wu,et al.  Online Heterogeneous Transfer Learning by Weighted Offline and Online Classifiers , 2016, ECCV Workshops.

[50]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[51]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[52]  Steven C. H. Hoi,et al.  Cost Sensitive Online Multiple Kernel Classification , 2016, ACML.

[53]  Steven C. H. Hoi,et al.  Online Learning: A Comprehensive Survey , 2018, Neurocomputing.

[54]  Rong Jin,et al.  Online AUC Maximization , 2011, ICML.

[55]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[56]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[57]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[58]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[59]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[60]  Koby Crammer,et al.  Exact Convex Confidence-Weighted Learning , 2008, NIPS.

[61]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[62]  Li Guo,et al.  E-Tree: An Efficient Indexing Structure for Ensemble Models on Data Streams , 2015, IEEE Transactions on Knowledge and Data Engineering.

[63]  John Shawe-Taylor,et al.  The Perceptron Algorithm with Uneven Margins , 2002, ICML.

[64]  Joachim M. Buhmann,et al.  Scalable Adaptive Stochastic Optimization Using Random Projections , 2016, NIPS.

[65]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[66]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..