Integrated artificial intelligence-based resizing strategy and multiple criteria decision making technique to form a management decision in an imbalanced environment

Classification in an imbalanced dataset is a current challenge in machine learning communities, as the class-imbalanced problem deteriorates the performance of numerous classifiers. This study introduces a two-stage intelligent data preprocessing approach to tackle the class-imbalanced problem. By modifying the penalty parameter of the support vector machine (SVM), the discriminating boundary will move toward the majority class and in turn misclassify the majority class examples as minority class examples. That is, more misclassifications for the majority class examples are equivalent to a greater number of minority class examples. Executing the SVM as a preprocessor can be used to overcome the class imbalanced problem. Sequentially, the modified dataset undergoes the random forest to defy the curse of dimensionality. Finally, the preprocessed data are fed into a rule-based classifier to generate comprehensive decision rules. According to the empirical results, the presented architecture is a promising alternative for the class-imbalanced problem.

[1]  Jean H. P. Paelinck,et al.  Qualitative multiple criteria analysis, environmental protection and multiregional development , 1976 .

[2]  Yu-lin He,et al.  OWA operator based link prediction ensemble for social network , 2015, Expert Syst. Appl..

[3]  Yang Liu,et al.  Combining integrated sampling with SVM ensembles for learning from imbalanced datasets , 2011, Inf. Process. Manag..

[4]  Behrooz Karimi,et al.  Deriving preference order of open pit mines equipment through MADM methods: Application of modified VIKOR method , 2011, Expert Syst. Appl..

[5]  Sin-Jin Lin,et al.  Multi-agent Architecture for Corporate Operating Performance Assessment , 2014, Neural Processing Letters.

[6]  Latesh G. Malik,et al.  Modality of Adaptive Neuro-Fuzzy Classifier for Acoustic Signal-Based Traffic Density State Estimation Employing Linguistic Hedges for Feature Selection , 2016, Int. J. Fuzzy Syst..

[7]  Sungwan Bang,et al.  Hierarchically penalized support vector machine with grouped variables , 2017, Int. J. Mach. Learn. Cybern..

[8]  Wei-Zhi Wu,et al.  Evidence-theory-based numerical characterization of multigranulation rough sets in incomplete information systems , 2016, Fuzzy Sets Syst..

[9]  Gwo-Hshiung Tzeng,et al.  Multicriteria Planning of Post‐Earthquake Sustainable Reconstruction , 2002 .

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Zhiping Lin,et al.  Weighted Online Sequential Extreme Learning Machine for Class Imbalance Learning , 2013, Neural Processing Letters.

[12]  Francisco Herrera,et al.  Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems , 2009, Appl. Soft Comput..

[13]  Yongqiao Wang Smooth Nonparametric Copula Estimation with Least Squares Support Vector Regression , 2012, Neural Processing Letters.

[14]  Jürgen Eichberger,et al.  Case-based belief formation under ambiguity , 2010, Math. Soc. Sci..

[15]  Ester Bernadó-Mansilla,et al.  Evolutionary rule-based systems for imbalanced data sets , 2008, Soft Comput..

[16]  Yu-Lin He,et al.  Fuzzy nonlinear regression analysis using a random weight network , 2016, Inf. Sci..

[17]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[18]  Qiang Yang,et al.  Test strategies for cost-sensitive decision trees , 2006, IEEE Transactions on Knowledge and Data Engineering.

[19]  Shom Prasad Das,et al.  A novel hybrid model using teaching–learning-based optimization and a support vector machine for commodity futures index forecasting , 2015, International Journal of Machine Learning and Cybernetics.

[20]  So Young Sohn,et al.  Support vector machines for default prediction of SMEs based on technology credit , 2010, Eur. J. Oper. Res..

[21]  Sheng Chen,et al.  A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems , 2011, Neurocomputing.

[22]  Haitao Xu,et al.  Multiple rank multi-linear kernel support vector machine for matrix data classification , 2018, Int. J. Mach. Learn. Cybern..

[23]  Stephen I. Gallant,et al.  Connectionist expert systems , 1988, CACM.

[24]  Gwo-Hshiung Tzeng,et al.  Compromise solution by MCDM methods: A comparative analysis of VIKOR and TOPSIS , 2004, Eur. J. Oper. Res..

[25]  Da Ruan,et al.  A vague-rough set approach for uncertain knowledge acquisition , 2011, Knowl. Based Syst..

[26]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[27]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[28]  Sin-Jin Lin,et al.  Incorporated risk metrics and hybrid AI techniques for risk management , 2017, Neural Computing and Applications.

[29]  Xiaoming Chen,et al.  Discriminative structure discovery via dimensionality reduction for facial image manifold , 2014, Neural Computing and Applications.

[30]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[31]  Luis González Abril,et al.  Ameva: An autonomous discretization algorithm , 2009, Expert Syst. Appl..

[32]  Jian Ma,et al.  Two credit scoring models based on dual strategy ensemble trees , 2012, Knowl. Based Syst..

[33]  Xizhao Wang,et al.  Fuzziness based sample categorization for classifier performance improvement , 2015, J. Intell. Fuzzy Syst..

[34]  Shuliang Li,et al.  Classification techniques for the identification of falsified financial statements: a comparative analysis , 2009 .

[35]  Songsong Wu,et al.  Stochastic neighbor projection on manifold for feature extraction , 2011, Neurocomputing.

[36]  Gang Kou,et al.  An empirical study of classification algorithm evaluation for financial risk prediction , 2011, Appl. Soft Comput..

[37]  Yu-Lin He,et al.  Fuzziness based semi-supervised learning approach for intrusion detection system , 2017, Inf. Sci..

[38]  M. A. H. Farquad,et al.  Preprocessing unbalanced data using support vector machine , 2012, Decis. Support Syst..

[39]  David Lacey,et al.  Detecting complex account fraud in the enterprise: The role of technical and non-technical controls , 2011, Decis. Support Syst..

[40]  Hong-Jie Xing,et al.  Two-stage dimensionality reduction approach based on 2DLDA and fuzzy rough sets technique , 2011, Neurocomputing.

[41]  Madjid Tavana,et al.  An extended VIKOR method using stochastic data and subjective judgments , 2016, Comput. Ind. Eng..

[42]  Sin-Jin Lin,et al.  Hybrid Kernelized Fuzzy Clustering and Multiple Attributes Decision Analysis for Corporate Risk Management , 2017, Int. J. Fuzzy Syst..

[43]  Rafael Berlanga Llavori,et al.  Finding association rules in semantic web data , 2012, Knowl. Based Syst..

[44]  Duoqian Miao,et al.  A variable precision rough set model based on the granularity of tolerance relation , 2016, Knowl. Based Syst..

[45]  M. Friedman Explanation and Scientific Understanding , 1974 .

[46]  Ee-Peng Lim,et al.  On strategies for imbalanced text classification using SVM: A comparative study , 2009, Decis. Support Syst..

[47]  Chin-Tsai Lin,et al.  An application of AHP and sensitivity analysis for selecting the best slicing machine , 2007, Comput. Ind. Eng..

[48]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Two Approaches to Data Mining from Imbalanced Data , 2004, J. Intell. Manuf..

[49]  Joachim Diederich,et al.  Eclectic Rule-Extraction from Support Vector Machines , 2005 .

[50]  James Nga-Kwok Liu,et al.  Domain ontology graph model and its application in Chinese text classification , 2012, Neural Computing and Applications.

[51]  Xizhao Wang,et al.  Learning from big data with uncertainty - editorial , 2015, J. Intell. Fuzzy Syst..

[52]  Xizhao Wang,et al.  Performance improvement of classifier fusion for batch samples based on upper integral , 2015, Neural Networks.