A cost-sensitive multi-criteria quadratic programming model for imbalanced data

AbstractMultiple Criteria Quadratic Programming (MCQP), a mathematical programming-based classification method, has been developed recently and proved to be effective and scalable. However, its performance degraded when learning from imbalanced data. This paper proposes a cost-sensitive MCQP (CS-MCQP) model by introducing the cost of misclassifications to the MCQP model. The empirical tests were designed to compare the proposed model with MCQP and a selection of classifiers on 26 imbalanced datasets from the UCI repositories. The results indicate that the CS-MCQP model not only performs better than the optimization-based models (MCQP and SVM), but also outperforms the selected classifiers, ensemble, preprocessing techniques and hybrid methods on imbalanced datasets in terms of AUC and GeoMean measures. To validate the results statistically, Student’s t test and Wilcoxon signed-rank test were conducted and show that the superiority of CS-MCQP is statistically significant with significance level 0.05. In addition, we analyze the effect of noisy, small disjunct and overlapping data properties on the proposed model and conclude that the CS-MCQP model achieves better performance on imbalanced data with overlapping feature than noisy and small disjunct data.

[1]  Zhengxin Chen,et al.  A Multi-criteria Convex Quadratic Programming model for credit data analysis , 2008, Decis. Support Syst..

[2]  María José del Jesús,et al.  Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets , 2009, Int. J. Approx. Reason..

[3]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[4]  Nuno Vasconcelos,et al.  Cost-Sensitive Support Vector Machines , 2012, Neurocomputing.

[5]  Giorgio Valentini,et al.  Support vector machines for candidate nodules classification , 2005, Neurocomputing.

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Ubaldo M. García-Palomares,et al.  Novel linear programming approach for building a piecewise nonlinear binary classifier with a priori accuracy , 2012, Decis. Support Syst..

[8]  Yong Shi,et al.  Multiple criteria optimization-based data mining methods and applications: a systematic survey , 2010, Knowledge and Information Systems.

[9]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[10]  C-T Chang On product classification with various membership functions and binary behaviour , 2014, J. Oper. Res. Soc..

[11]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[12]  S. Sinha A Duality Theorem for Nonlinear Programming , 1966 .

[13]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[14]  P. Wolfe A duality theorem for non-linear programming , 1961 .

[15]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[16]  María José del Jesús,et al.  A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets , 2008, Fuzzy Sets Syst..

[17]  Foster J. Provost,et al.  Explaining Data-Driven Document Classifications , 2013, MIS Q..

[18]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[19]  Jian Ma,et al.  Sentiment classification: The contribution of ensemble learning , 2014, Decis. Support Syst..

[20]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[21]  Dong Zhou,et al.  Translation techniques in cross-language information retrieval , 2012, CSUR.

[22]  Li-Chiu Chang,et al.  Forecasting of ozone episode days by cost-sensitive neural network methods. , 2009, The Science of the total environment.

[23]  José Hernández-Orallo,et al.  An experimental comparison of performance measures for classification , 2009, Pattern Recognit. Lett..

[24]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[25]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[26]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[27]  Dazhe Zhao,et al.  Measure oriented cost-sensitive SVM for 3D nodule detection , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[28]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[29]  Yi Peng,et al.  Discovering Credit Cardholders’ Behavior by Multiple Criteria Linear Programming , 2005, Ann. Oper. Res..

[30]  Huimin Zhao,et al.  An extended tuning method for cost-sensitive regression and forecasting , 2011, Decis. Support Syst..

[31]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[32]  Yinghuan Shi,et al.  Transductive cost-sensitive lung cancer image classification , 2012, Applied Intelligence.

[33]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[34]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[35]  Gary M. Weiss The Impact of Small Disjuncts on Classifier Learning , 2010, Data Mining.

[36]  Zhengxin Chen,et al.  A Descriptive Framework for the Field of Data Mining and Knowledge Discovery , 2008, Int. J. Inf. Technol. Decis. Mak..

[37]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[38]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[39]  Olvi L. Mangasarian,et al.  Machine learning and data mining via mathematical programming-based support vector machines , 2003 .

[40]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[41]  Paul S. Bradley,et al.  Mathematical Programming for Data Mining: Formulations and Challenges , 1999, INFORMS J. Comput..

[42]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Decision-Tree Induction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[43]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[44]  Jing He,et al.  MCLP-based methods for improving "Bad" catching rate in credit cardholder behavior analysis , 2008, Appl. Soft Comput..

[45]  Kyungsik Lee,et al.  Multi-class classification using a signomial function , 2015, J. Oper. Res. Soc..

[46]  Zhengxin Chen,et al.  Multiple criteria mathematical programming for multi-class classification and application in network intrusion detection , 2009, Inf. Sci..

[47]  Steven C. H. Hoi,et al.  Cost-Sensitive Online Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[48]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[49]  Yuhua Qian,et al.  Test-cost-sensitive attribute reduction , 2011, Inf. Sci..

[50]  F. Glover,et al.  Simple but powerful goal programming models for discriminant problems , 1981 .

[51]  Yi Peng,et al.  Evaluation of Classification Algorithms Using MCDM and Rank Correlation , 2012, Int. J. Inf. Technol. Decis. Mak..

[52]  Dimitris K. Tasoulis,et al.  Adaptive consumer credit classification , 2012, J. Oper. Res. Soc..

[53]  Lars Schmidt-Thieme,et al.  Cost-sensitive learning methods for imbalanced data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[54]  Ee-Peng Lim,et al.  On strategies for imbalanced text classification using SVM: A comparative study , 2009, Decis. Support Syst..

[55]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[56]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[57]  Paolo Soda,et al.  A multi-objective optimisation approach for class imbalance learning , 2011, Pattern Recognit..

[58]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[59]  Sunil Vadera,et al.  A survey of cost-sensitive decision tree induction algorithms , 2013, CSUR.

[60]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[61]  Kai Ming Ting,et al.  An Instance-weighting Method to Induce Cost-sensitive Trees , 2001 .

[62]  Robert B. Fisher,et al.  Classifying imbalanced data sets using similarity based hierarchical decomposition , 2015, Pattern Recognit..

[63]  Wei T. Yue,et al.  A cost-based analysis of intrusion detection system configuration under active or passive response , 2010, Decis. Support Syst..

[64]  Yong Shi,et al.  Several multi-criteria programming methods for classification , 2009, Comput. Oper. Res..