Online sparse class imbalance learning on big data

Class imbalance learning is the study of problems in which some classes appear more frequently than the others. Most existing works that study this problem assume data set to be dense and do not exploit the rich structure of the data. One such structure is the sparsity. In the present work, we focus on solving the class imbalance problem under the sparsity assumption. More specifically, a well-known Gmean metric for class imbalance learning problem in binary classification setting has been maximized, which results in a non-convex loss function. Convex relaxation techniques are used to convert the non-convex problem to the convex problem. The problem formulation in the present work uses L1 regularized proximal learning framework and is solved via accelerated-stochastic-proximal gradient descent algorithm. Our aim in the paper is to show: (i) the application of proximal algorithms to solve real world problems (class imbalance); (ii) how it scales to Big data; and (iii) how it outperforms some recently proposed algorithms in terms of Gmean, F-measure and Mistake rate on several benchmark data sets.

[1]  Nan Liu,et al.  Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift , 2015, Neurocomputing.

[2]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[3]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[4]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[5]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[6]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[7]  Steven C. H. Hoi,et al.  A Framework of Sparse Online Learning and Its Applications , 2015, ArXiv.

[8]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[9]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[10]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[11]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[12]  Claudio Gentile,et al.  A New Approximate Maximal Margin Classification Algorithm , 2002, J. Mach. Learn. Res..

[13]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[14]  Yongdong Zhang,et al.  Adaptive weighted imbalance learning with application to abnormal activity recognition , 2016, Neurocomputing.

[15]  Atsushi Nitanda,et al.  Stochastic Proximal Gradient Descent with Acceleration Techniques , 2014, NIPS.

[16]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[17]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[18]  Fei Wang,et al.  Latent outlier detection and the low precision problem , 2013, ODD '13.

[19]  Yan Liu,et al.  GLAD: group anomaly detection in social media analysis , 2014, KDD.

[20]  James T. Kwok,et al.  Accelerated Gradient Methods for Stochastic Optimization and Online Learning , 2009, NIPS.

[21]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[22]  Shou-De Lin,et al.  Cost-Sensitive Multi-Label Learning for Audio Tag Annotation and Retrieval , 2011, IEEE Transactions on Multimedia.

[23]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[24]  Xi Chen,et al.  Direct Robust Matrix Factorizatoin for Anomaly Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[25]  Xin Yao,et al.  Resampling-Based Ensemble Methods for Online Class Imbalance Learning , 2015, IEEE Transactions on Knowledge and Data Engineering.

[26]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[27]  Marius Kloft,et al.  Toward Supervised Anomaly Detection , 2014, J. Artif. Intell. Res..

[28]  Koby Crammer,et al.  New Adaptive Algorithms for Online Classification , 2010, NIPS.

[29]  Koby Crammer,et al.  Exact Convex Confidence-Weighted Learning , 2008, NIPS.

[30]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[31]  Qiang Yang,et al.  Test strategies for cost-sensitive decision trees , 2006, IEEE Transactions on Knowledge and Data Engineering.

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[34]  Steven C. H. Hoi,et al.  Cost-Sensitive Online Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[35]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[36]  Zhi-Hua Zhou,et al.  The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study , 2006, Sixth International Conference on Data Mining (ICDM'06).

[37]  Durga Toshniwal,et al.  Online anomaly detection via class-imbalance learning , 2015, 2015 Eighth International Conference on Contemporary Computing (IC3).

[38]  Chunyan Miao,et al.  High-Dimensional Data Stream Classification via Sparse Online Learning , 2014, 2014 IEEE International Conference on Data Mining.

[39]  Huanhuan Chen,et al.  Negative correlation learning for classification ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).