Evidential reasoning based ensemble classifier for uncertain imbalanced data

Abstract Various studies have focused on the classification of uncertain or imbalanced data. However, previous studies rarely consider the classification for uncertain imbalanced data. To address this research gap, this study proposes an evidential reasoning (ER) based ensemble classifier (EREC). In the proposed method, an affinity propagation based oversampling method is developed to obtain the balanced class distributions of the training datasets for individual classifiers. Using the balanced training datasets, ER-based classifiers are constructed as individual classifiers to handle data uncertainty, in which attribute weights are learned from the similarity between the values of attributes and labels. With trained individual classifiers, final results are generated by combining the results of individual classifiers using the ER algorithm, in which the weights of individual classifiers are determined according to the classification performance on out-of-bag data. The proposed EREC is applied to the diagnosis of thyroid nodules using the datasets of five radiologists, obtained from a tertiary hospital located in Hefei, Anhui, China. Using real datasets and UCI datasets, the EREC is compared with 12 representative ensemble classifiers and other oversampling methods based ensemble classifiers to highlight its high performance.

[1]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[2]  Mehmet Aci,et al.  K nearest neighbor reinforced expectation maximization method , 2011, Expert Syst. Appl..

[3]  Kay Chen Tan,et al.  Evolutionary Cluster-Based Synthetic Oversampling Ensemble (ECO-Ensemble) for Imbalance Learning , 2017, IEEE Transactions on Cybernetics.

[4]  Jin Young Kwak,et al.  Thyroid imaging reporting and data system for US features of nodules: a step in establishing better stratification of cancer risk. , 2011, Radiology.

[5]  Zhiyong Gao,et al.  Fault recognition using an ensemble classifier based on Dempster-Shafer Theory , 2020, Pattern Recognit..

[6]  Mei Bai,et al.  An algorithm for classification over uncertain data based on extreme learning machine , 2016, Neurocomputing.

[7]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[8]  Yu Liu,et al.  Evidence Combination Based on Credal Belief Redistribution for Pattern Classification , 2020, IEEE Transactions on Fuzzy Systems.

[9]  Atif Alamri,et al.  6G-Enabled IoT Home Environment Control Using Fuzzy Rules , 2021, IEEE Internet of Things Journal.

[10]  Shahrokh Asadi,et al.  Improvement of Bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization , 2020, Eng. Appl. Artif. Intell..

[11]  Wenchao Huang,et al.  Incremental learning imbalanced data streams with concept drift: The dynamic updated ensemble algorithm , 2020, Knowl. Based Syst..

[12]  Xin Yao,et al.  Ensemble of Classifiers Based on Multiobjective Genetic Sampling for Imbalanced Data , 2020, IEEE Transactions on Knowledge and Data Engineering.

[13]  Chao Fu,et al.  Data-driven multiple criteria decision making for diagnosis of thyroid cancer , 2018, Ann. Oper. Res..

[14]  Michal Jakubczyk,et al.  A framework for sensitivity analysis of decision trees , 2017, Central European Journal of Operations Research.

[15]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[16]  Ying Wang,et al.  Feature-based evidential reasoning for probabilistic risk analysis and prediction , 2021, Eng. Appl. Artif. Intell..

[17]  Ronei Marcos de Moraes,et al.  Fuzzy expert systems architecture for image classification using mathematical morphology operators , 2002, Inf. Sci..

[18]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[19]  Shanlin Yang,et al.  Data-driven group decision making for diagnosis of thyroid nodule , 2019, Science China Information Sciences.

[20]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[21]  Jian-Bo Yang,et al.  Environmental impact assessment using the evidential reasoning approach , 2006, Eur. J. Oper. Res..

[22]  Quan Pan,et al.  Classification of uncertain and imprecise data based on evidence theory , 2014, Neurocomputing.

[23]  Gonzalo Martínez-Muñoz,et al.  Out-of-bag estimation of the optimal sample size in bagging , 2010, Pattern Recognit..

[24]  Dongchu Sun,et al.  The improved AdaBoost algorithms for imbalanced data classification , 2021, Inf. Sci..

[25]  Jianning Li,et al.  Evidence reasoning rule-based classifier with uncertainty quantification , 2020, Inf. Sci..

[26]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[27]  Jeong Hyun Lee,et al.  Benign and malignant thyroid nodules: US differentiation--multicenter retrospective study. , 2008, Radiology.

[28]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[29]  R. Jeffrey,et al.  Management of thyroid nodules detected at US: Society of Radiologists in Ultrasound consensus conference statement. , 2005, Ultrasound quarterly.

[30]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[31]  E. Horvath,et al.  Prospective validation of the ultrasound based TIRADS (Thyroid Imaging Reporting And Data System) classification: results in surgically resected thyroid nodules , 2017, European Radiology.

[32]  Chao Liu,et al.  K-AP Clustering Algorithm for Large Scale Dataset , 2011, 2011 First International Workshop on Complexity and Data Mining.

[33]  Qing Li,et al.  Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering , 2020, Inf. Sci..

[34]  Wonho Lee,et al.  A proposal for a thyroid imaging reporting and data system for ultrasound features of thyroid carcinoma. , 2009, Thyroid : official journal of the American Thyroid Association.

[35]  Quan Pan,et al.  Combination of Classifiers With Optimal Weight Based on Evidential Reasoning , 2018, IEEE Transactions on Fuzzy Systems.

[36]  Chao Fu,et al.  Fair framework for multiple criteria decision making , 2018, Comput. Ind. Eng..

[37]  Wei Wei,et al.  Intelligent Internet of Things System for Smart Home Optimal Convection , 2021, IEEE Transactions on Industrial Informatics.

[38]  Dong-Ling Xu,et al.  Evidential reasoning rule for evidence combination , 2013, Artif. Intell..

[39]  Quan Pan,et al.  Hybrid Classification System for Uncertain Data , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[40]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[41]  Jian-Bo Yang,et al.  Rule and utility based evidential reasoning approach for multiattribute decision analysis under uncertainties , 2001, Eur. J. Oper. Res..

[42]  Weiyong Liu,et al.  Data-driven selection of multi-criteria decision-making methods and its application to diagnosis of thyroid nodules , 2020, Comput. Ind. Eng..

[43]  Chen Gong,et al.  Cost-sensitive positive and unlabeled learning , 2021, Inf. Sci..

[44]  Xinyang Deng,et al.  D number theory based game-theoretic framework in adversarial decision making under a fuzzy environment , 2019, Int. J. Approx. Reason..

[45]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[46]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[47]  Xiaoyong Du,et al.  A novel Bayesian classification for uncertain data , 2011, Knowl. Based Syst..

[48]  Steven J. Simske,et al.  Performance analysis of pattern classifier combination by plurality voting , 2003, Pattern Recognit. Lett..

[49]  Jane You,et al.  Hybrid Classifier Ensemble for Imbalanced Data , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[50]  Jian-Bo Yang,et al.  Data classification using evidence reasoning rule , 2017, Knowl. Based Syst..

[51]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.