Robust bounded logistic regression in the class imbalance problem

In this paper, we propose to deal with the problems of logistic regression with outliers and class imbalance, which are common in a wide range of practical applications. The robust bounded logistic regression with different error costs is developed to reduce the combined influence of outliers and class imbalance. First, inspired by the Correntropy induced loss function, we develop the bounded logistic loss function which is a monotonic, bounded and nonconvex loss and thus robust to outliers. With the bounded logistic loss, we construct a new robust logistic regression. Second, under the principle of cost-sensitive learning, we assign different error costs for different classes in order to reduce the sensitiveness of the new robust logistic regression to class imbalance. Using the half-quadratic optimization method, it is easy to optimize the proposed logistic regression model. Experimental results demonstrate that our proposed method improves the performance of logistic regression on the datasets with outliers and class imbalance.

[1]  Taghi M. Khoshgoftaar,et al.  Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[2]  Zheng Cao,et al.  Robust support vector machines based on the rescaled hinge loss function , 2017, Pattern Recognition.

[3]  Shuang-Hong Yang,et al.  A Stagewise Least Square Loss Function for Classification , 2008, SDM.

[4]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[5]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Weifeng Liu,et al.  Correntropy: Properties and Applications in Non-Gaussian Signal Processing , 2007, IEEE Transactions on Signal Processing.

[7]  Ran He,et al.  Maximum Correntropy Criterion for Robust Face Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ata Kabán,et al.  Learning kernel logistic regression in the presence of class label noise , 2014, Pattern Recognition.

[9]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[10]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[11]  Shirui Pan,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Graph Classification with Imbalanced Class Distributions and Noise ∗ , 2022 .

[12]  Vasile Palade,et al.  FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning , 2010, IEEE Transactions on Fuzzy Systems.

[13]  Johan A. K. Suykens,et al.  Benchmarking Least Squares Support Vector Machine Classifiers , 2004, Machine Learning.

[14]  Xin Yao,et al.  Resampling-Based Ensemble Methods for Online Class Imbalance Learning , 2015, IEEE Transactions on Knowledge and Data Engineering.

[15]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[16]  Nuno Vasconcelos,et al.  On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost , 2008, NIPS.

[17]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[18]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[19]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[20]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[21]  S. Sathiya Keerthi,et al.  A Fast Dual Algorithm for Kernel Logistic Regression , 2002, 2007 International Joint Conference on Neural Networks.

[22]  José Carlos Príncipe,et al.  The C-loss function for pattern classification , 2014, Pattern Recognit..

[23]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[24]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[25]  Bao-Gang Hu,et al.  Robust feature extraction via information theoretic learning , 2009, ICML '09.

[26]  Maher Maalouf,et al.  Computational Statistics and Data Analysis Robust Weighted Kernel Logistic Regression in Imbalanced and Rare Events Data , 2022 .

[27]  Marco Wiering,et al.  2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) , 2011, IJCNN 2011.

[28]  José Carlos Príncipe,et al.  An asymmetric stagewise least square loss function for imbalanced classification , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[29]  Mila Nikolova,et al.  Analysis of Half-Quadratic Minimization Methods for Signal and Image Recovery , 2005, SIAM J. Sci. Comput..

[30]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[31]  D. Pregibon Logistic Regression Diagnostics , 1981 .

[32]  Maoguo Gong,et al.  RBoost: Label Noise-Robust Boosting Algorithm Based on a Nonconvex Loss Function and the Numerically Stable Base Learners , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Seo Young Park,et al.  Robust penalized logistic regression with truncated loss functions , 2011, Canadian Journal of Statistics-revue Canadienne De Statistique.

[34]  Yufeng Liu,et al.  Robust Truncated Hinge Loss Support Vector Machines , 2007 .

[35]  Yunqian Ma,et al.  Imbalanced Learning: Foundations, Algorithms, and Applications , 2013 .

[36]  Nuno Vasconcelos,et al.  Risk minimization, probability elicitation, and cost-sensitive SVMs , 2010, ICML.

[37]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Bao-Gang Hu,et al.  A New Strategy of Cost-Free Learning in the Class Imbalance Problem , 2014, IEEE Transactions on Knowledge and Data Engineering.