Class Imbalance and Cost-Sensitive Decision Trees

Class imbalance treatment methods and cost-sensitive classification algorithms are typically treated as two independent research areas. However, many of these techniques have properties in common. ...

[1]  Robert C. Holte,et al.  Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[2]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[3]  Debashree Devi,et al.  A Cost-sensitive weighted Random Forest Technique for Credit Card Fraud Detection , 2019, 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT).

[4]  P. van der Putten,et al.  A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000 , 2004 .

[5]  C. Bunkhumpornpat Safe level graph for majority under-sampling techniques , 2014 .

[6]  Jong-Seok Lee,et al.  AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification , 2019, IEEE Access.

[7]  María José del Jesús,et al.  A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets , 2008, Fuzzy Sets Syst..

[8]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[9]  Björn E. Ottersten,et al.  Example-dependent cost-sensitive decision trees , 2015, Expert Syst. Appl..

[10]  Yue-Shi Lee,et al.  Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset , 2006 .

[11]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[12]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[13]  AbreuPedro Henriques,et al.  Predicting Breast Cancer Recurrence Using Machine Learning Techniques , 2016 .

[14]  David A. Cieslak,et al.  Learning Decision Trees for Unbalanced Data , 2008, ECML/PKDD.

[15]  Md Zahidul Islam,et al.  RBClust: High quality class-specific clustering using rule-based classification , 2016, ESANN.

[16]  Amin Ahsan Ali,et al.  Inter-node Hellinger Distance based Decision Tree , 2019, IJCAI.

[17]  Bin Gu,et al.  Cost-sensitive learning for defect escalation , 2014, Knowl. Based Syst..

[18]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[19]  Fabio Crestani,et al.  Like It or Not , 2016, ACM Comput. Surv..

[20]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[21]  Luís Torgo,et al.  A Survey of Predictive Modeling on Imbalanced Domains , 2016, ACM Comput. Surv..

[22]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[23]  Chris. Drummond,et al.  C 4 . 5 , Class Imbalance , and Cost Sensitivity : Why Under-Sampling beats OverSampling , 2003 .

[24]  Chumphol Bunkhumpornpat,et al.  DBMUTE: density-based majority under-sampling technique , 2017, Knowledge and Information Systems.

[25]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[26]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[27]  Roberto Alejo,et al.  A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios , 2013, Pattern Recognit. Lett..

[28]  Rosa Maria Valdovinos,et al.  New Applications of Ensembles of Classifiers , 2003, Pattern Analysis & Applications.

[29]  Björn E. Ottersten,et al.  Example-Dependent Cost-Sensitive Logistic Regression for Credit Scoring , 2014, 2014 13th International Conference on Machine Learning and Applications.

[30]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[31]  Taghi M. Khoshgoftaar,et al.  Building Useful Models from Imbalanced Data with Sampling and Boosting , 2008, FLAIRS.

[32]  Md Zahidul Islam,et al.  Standoff-Balancing: A Novel Class Imbalance Treatment Method Inspired by Military Strategy , 2015, Australasian Conference on Artificial Intelligence.

[33]  Xue-wen Chen,et al.  FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems , 2008, KDD.

[34]  Xue-wen Chen,et al.  Combating the Small Sample Class Imbalance Problem Using Feature Selection , 2010, IEEE Transactions on Knowledge and Data Engineering.

[35]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[36]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[37]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[38]  Shichao Zhang Multiple-scale cost sensitive decision tree learning , 2018, World Wide Web.

[39]  Md Zahidul Islam,et al.  Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem , 2015, Inf. Syst..

[40]  Victor S. Sheng,et al.  Maximum profit mining and its application in software development , 2006, KDD '06.

[41]  Igor Kononenko,et al.  Cost-Sensitive Learning with Neural Networks , 1998, ECAI.

[42]  Enhong Chen,et al.  Exploiting probabilistic topic models to improve text categorization under class imbalance , 2011, Inf. Process. Manag..

[43]  Krung Sinapiromsaran,et al.  Oblique Decision Tree Algorithm with Minority Condensation for Class Imbalanced Problem , 2020 .

[44]  Chumphol Bunkhumpornpat,et al.  MUTE: Majority under-sampling technique , 2011, 2011 8th International Conference on Information, Communications & Signal Processing.

[45]  Chih-Fong Tsai,et al.  Clustering-based undersampling in class-imbalanced data , 2017, Inf. Sci..

[46]  Josef Kittler,et al.  Inverse random under sampling for class imbalance problem and its application to multi-label classification , 2012, Pattern Recognit..

[47]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[48]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[49]  Md Zahidul Islam,et al.  Knowledge Discovery through SysFor - a Systematically Developed Forest of Multiple Decision Trees , 2011, AusDM.

[50]  B. Ripley Classification and Regression Trees , 2015 .

[51]  Md Zahidul Islam,et al.  Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques , 2013, Knowl. Based Syst..

[52]  Edward Y. Chang,et al.  Statistical learning for effective visual information retrieval , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[53]  Hong Zhao,et al.  Cost-Sensitive Feature Selection of Numeric Data with Measurement Errors , 2013, J. Appl. Math..

[54]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[55]  Thomas G. Dietterich,et al.  Applying the Waek Learning Framework to Understand and Improve C4.5 , 1996, ICML.

[56]  Nathalie Japkowicz,et al.  Concept-Learning in the Presence of Between-Class and Within-Class Imbalances , 2001, Canadian Conference on AI.

[57]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[58]  Cem Ergün,et al.  Clustering Based Under-Sampling for Improving Speaker Verification Decisions Using AdaBoost , 2004, SSPR/SPR.

[59]  Jian Ma,et al.  An improved SMO algorithm for financial credit risk assessment - Evidence from China's banking , 2018, Neurocomputing.

[60]  Lior Rokach,et al.  Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem , 2017, Neurocomputing.

[61]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[62]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[63]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[64]  Li Zhang,et al.  A Meta-Analysis of Multisample Type-II Censored Data With Parametric and Nonparametric Results , 2013, IEEE Transactions on Reliability.

[65]  Lars Schmidt-Thieme,et al.  Cost-sensitive learning methods for imbalanced data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[66]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[67]  Md Zahidul Islam,et al.  Addressing Class Imbalance and Cost Sensitivity in Software Defect Prediction by Combining Domain Costs and Balancing Costs , 2016, ADMA.

[68]  Gongping Yang,et al.  On the Class Imbalance Problem , 2008, 2008 Fourth International Conference on Natural Computation.

[69]  Bart Baesens,et al.  An empirical comparison of techniques for the class imbalance problem in churn prediction , 2017, Inf. Sci..

[70]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[71]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[72]  Victor S. Sheng,et al.  Roulette Sampling for Cost-Sensitive Learning , 2007, ECML.

[73]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[74]  Md Zahidul Islam,et al.  Cost Sensitive Decision Forest and Voting for Software Defect Prediction , 2014, PRICAI.

[75]  Miriam Seoane Santos,et al.  Predicting Breast Cancer Recurrence Using Machine Learning Techniques , 2016, ACM Comput. Surv..

[76]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[77]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[78]  KEVIN W. BOWYER Introduction to the Special Section of Best Papers From the 2007 Biometrics: Theory, Applications, and Systems Conference , 2009, IEEE Trans. Syst. Man Cybern. Part A.

[79]  Jing Zhao,et al.  ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data , 2013, Neurocomputing.

[80]  Sunil Vadera,et al.  A survey of cost-sensitive decision tree induction algorithms , 2013, CSUR.

[81]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[82]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[83]  Hong Zhao,et al.  A Backtracking Approach to Minimal Cost Feature Selection of Numerical Data , 2013 .

[84]  Hisashi Kashima,et al.  Roughly balanced bagging for imbalanced data , 2009, Stat. Anal. Data Min..