Imbalanced Learning Based on Logistic Discrimination

In recent years, imbalanced learning problem has attracted more and more attentions from both academia and industry, and the problem is concerned with the performance of learning algorithms in the presence of data with severe class distribution skews. In this paper, we apply the well-known statistical model logistic discrimination to this problem and propose a novel method to improve its performance. To fully consider the class imbalance, we design a new cost function which takes into account the accuracies of both positive class and negative class as well as the precision of positive class. Unlike traditional logistic discrimination, the proposed method learns its parameters by maximizing the proposed cost function. Experimental results show that, compared with other state-of-the-art methods, the proposed one shows significantly better performance on measures of recall, g-mean, f-measure, AUC, and accuracy.

[1]  Ying Mi,et al.  Imbalanced Classification Based on Active Learning SMOTE , 2013 .

[2]  Zhi-Hua Zhou,et al.  Learning Imbalanced Multi-class Data with Optimal Dichotomy Weights , 2013, 2013 IEEE 13th International Conference on Data Mining.

[3]  José Martínez Sotoca,et al.  Improving the Performance of the RBF Neural Networks Trained with Imbalanced Samples , 2007, IWANN.

[4]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  Nicolás García-Pedrajas,et al.  Nonlinear Boosting Projections for Ensemble Construction , 2007, J. Mach. Learn. Res..

[8]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[9]  Jingbo Zhu,et al.  Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem , 2007, EMNLP.

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[12]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[13]  Yunqian Ma,et al.  Imbalanced Learning: Foundations, Algorithms, and Applications , 2013 .

[14]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[15]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[16]  W. Marsden I and J , 2012 .

[17]  Philip E. Gill,et al.  Practical optimization , 1981 .

[18]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[19]  Bianca Zadrozny,et al.  Undersampling Strategy Based on Clustering to Improve the Performance of Splice Site Classification in Human Genes , 2013, 2013 24th International Workshop on Database and Expert Systems Applications.

[20]  Seyda Ertekin,et al.  Adaptive Oversampling for Imbalanced Data Classification , 2013, ISCIS.

[21]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[22]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.

[23]  Dan Geiger,et al.  Asymptotic Model Selection for Naive Bayesian Networks , 2002, J. Mach. Learn. Res..

[24]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[25]  Jerrold H. May,et al.  Evaluating and Tuning Predictive Data Mining Models Using Receiver Operating Characteristic Curves , 2004, J. Manag. Inf. Syst..

[26]  Vipin Kumar,et al.  Mining needle in a haystack: classifying rare classes via two-phase rule induction , 2001, SIGMOD '01.

[27]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[28]  Nathalie Japkowicz,et al.  Boosting support vector machines for imbalanced data sets , 2008, Knowledge and Information Systems.

[29]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[30]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[31]  Dazhe Zhao,et al.  A PSO-Based Cost-Sensitive Neural Network for Imbalanced Data Classification , 2013, PAKDD Workshops.