Cost-Sensitive Pattern-Based classification for Class Imbalance problems

In several problems, contrast pattern-based classifiers produce high accuracy and provide an explanation of the result in terms of the patterns used for classification. However, class imbalance problems are a great challenge for these classifiers because there exist significantly fewer objects belonging to a class regarding the remaining classes and this biases the classification to the majority class. Therefore, in this paper, we propose an algorithm for discovering cost-sensitive patterns in class imbalance problems and a pattern-based classifier which uses these patterns for classification. Our proposal follows the idea of fusing pattern discovery with the cost-sensitive approach for class imbalance problems. Our experiments show that our proposal obtains cost-sensitive patterns, which allow attaining significantly lower misclassification cost than using patterns mined by other well-known state-of-the-art pattern miners. Also, we show that our proposed pattern-based classifier is suitable for working with cost-sensitive patterns.

[1]  Marek Kretowski,et al.  Evolutionary Induction of Cost-Sensitive Decision Trees , 2006, ISMIS.

[2]  Jianping Li,et al.  On the complexity of finding emerging patterns , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[3]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[4]  Francisco Herrera,et al.  Dynamic ensemble selection for multi-class imbalanced datasets , 2018, Inf. Sci..

[5]  Raúl Monroy,et al.  Some features speak loud, but together they all speak louder: A study on the correlation between classification error and feature usage in decision-tree classification ensembles , 2018, Eng. Appl. Artif. Intell..

[6]  Francisco Herrera,et al.  Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[7]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[8]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[9]  Kotagiri Ramamohanarao,et al.  An Efficient Single-Scan Algorithm for Mining Essential Jumping Emerging Patterns for Classification , 2002, PAKDD.

[10]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[11]  Ester Bernadó-Mansilla,et al.  Evolutionary rule-based systems for imbalanced data sets , 2008, Soft Comput..

[12]  Qiang Yang,et al.  Simple Test Strategies for Cost-Sensitive Decision Trees , 2005, ECML.

[13]  Francisco Herrera,et al.  SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary , 2018, J. Artif. Intell. Res..

[14]  Xindong Wu,et al.  Mining emerging patterns by streaming feature selection , 2012, KDD.

[15]  Der-Chiang Li,et al.  A learning method for the class imbalance problem with medical data sets , 2010, Comput. Biol. Medicine.

[16]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[17]  Zhouzhou Liu,et al.  Finding Contrast Patterns in Imbalanced Classification based on Sliding Window , 2016 .

[18]  Salvador García,et al.  Cost-Sensitive back-propagation neural networks with binarization techniques in addressing multi-class problems and non-competent classifiers , 2017, Appl. Soft Comput..

[19]  Johannes Fürnkranz,et al.  From Local Patterns to Global Models: The LeGo Approach to Data Mining , 2008 .

[20]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[21]  Ioannis P. Vlahavas,et al.  PolyA-iEP: A data mining method for the effective prediction of polyadenylation sites , 2011, Expert Syst. Appl..

[22]  Jesús Ariel Carrasco-Ochoa,et al.  PBC4cip: A new contrast pattern-based classifier for class imbalance problems , 2017, Knowl. Based Syst..

[23]  Francisco Herrera,et al.  On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed , 2014, Inf. Sci..

[24]  Yue Xu,et al.  Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets , 2018, Inf. Sci..

[25]  José Francisco Martínez Trinidad,et al.  LCMine: An efficient algorithm for mining discriminative regularities and its application in supervised classification , 2010, Pattern Recognit..

[26]  Kotagiri Ramamohanarao,et al.  A Bayesian Approach to Use Emerging Patterns for Classification , 2003, ADC.

[27]  David A. Cieslak,et al.  Learning Decision Trees for Unbalanced Data , 2008, ECML/PKDD.

[28]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[29]  Marek Kretowski,et al.  Evolutionary Induction of Decision Trees for Misclassification Cost Minimization , 2007, ICANNGA.

[30]  Nicola Torelli,et al.  Training and assessing classification rules with imbalanced data , 2012, Data Mining and Knowledge Discovery.

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  Jun Du,et al.  Cost-Sensitive Decision Trees with Pre-pruning , 2007, Canadian Conference on AI.

[33]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[34]  Luis Enrique Sucar,et al.  On Fisher vector encoding of binary features for video face recognition , 2018, J. Vis. Commun. Image Represent..

[35]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[36]  Nuno Vasconcelos,et al.  Cost-Sensitive Support Vector Machines , 2012, Neurocomputing.

[37]  Björn E. Ottersten,et al.  Example-dependent cost-sensitive decision trees , 2015, Expert Syst. Appl..

[38]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[39]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[40]  Hamad Alhammady A Novel Approach For Mining Emerging Patterns in Rare-class Datasets , 2007 .

[41]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[42]  Anonymous,et al.  Preliminaries , 2020, Brain, Behavior and Evolution.

[43]  Francisco Herrera,et al.  DRCW-ASEG: One-versus-One distance-based relative competence weighting with adaptive synthetic example generation for multi-class imbalanced datasets , 2018, Neurocomputing.

[44]  James Bailey,et al.  Classification Using Constrained Emerging Patterns , 2003, WAIM.

[45]  Kotagiri Ramamohanarao,et al.  Information-Based Classification by Aggregating Emerging Patterns , 2000, IDEAL.

[46]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[47]  Guozhu Dong,et al.  Incremental Maintenance of Emerging Patterns , 2013, Contrast Data Mining.

[48]  José Francisco Martínez Trinidad,et al.  Finding the best diversity generation procedures for mining contrast patterns , 2015, Expert Syst. Appl..

[49]  Guozhu Dong,et al.  Masquerader Detection Using OCLEP: One-Class Classification Using Length Statistics of Emerging Patterns , 2006, 2006 Seventh International Conference on Web-Age Information Management Workshops.

[50]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[51]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[52]  Yiguang Liu,et al.  Improving Random Forest and Rotation Forest for highly imbalanced datasets , 2015, Intell. Data Anal..

[53]  Xiuzhen Zhang,et al.  Overview and Analysis of Contrast Pattern Based Classification , 2013, Contrast Data Mining.

[54]  Chengqi Zhang,et al.  Cost-Time Sensitive Decision Tree with Missing Values , 2007, KSEM.

[55]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[56]  Guozhu Dong Overview of Results on Contrast Mining and Applications , 2013, Contrast Data Mining.

[57]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[58]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[59]  Kotagiri Ramamohanarao,et al.  Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[60]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[61]  Kotagiri Ramamohanarao,et al.  Using emerging patterns and decision trees in rare-class classification , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[62]  Francisco Herrera,et al.  Addressing imbalanced classification with instance generation techniques: IPADE-ID , 2014, Neurocomputing.

[63]  Francisco Herrera,et al.  Study on the Impact of Partition-Induced Dataset Shift on $k$-Fold Cross-Validation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[64]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[65]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[66]  Krzysztof Grąbczewski,et al.  Techniques of Decision Tree Induction , 2014 .

[67]  Kotagiri Ramamohanarao,et al.  Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets , 2000, KDD '00.

[68]  Kotagiri Ramamohanarao,et al.  A Robust Classifier for Imbalanced Datasets , 2014, PAKDD.

[69]  José Francisco Martínez Trinidad,et al.  Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases , 2016, Neurocomputing.

[70]  D. Missé,et al.  Zika virus: epidemiology, clinical features and host-virus interactions. , 2016, Microbes and infection.

[71]  Kotagiri Ramamohanarao,et al.  The Application of Emerging Patterns for Improving the Quality of Rare-Class Classification , 2004, PAKDD.

[72]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[73]  Xiuzhen Zhang,et al.  Improving k Nearest Neighbor with Exemplar Generalization for Imbalanced Classification , 2011, PAKDD.

[74]  Carlos Soares,et al.  Preference rules for label ranking: Mining patterns in multi-target relations , 2018, Inf. Fusion.

[75]  Jesús Ariel Carrasco-Ochoa,et al.  Evaluation of quality measures for contrast patterns by using unseen objects , 2017, Expert Syst. Appl..

[76]  Siti Mariyam Shamsuddin,et al.  Classification with class imbalance problem: A review , 2015, SOCO 2015.

[77]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[78]  Yajing Gao,et al.  A New Contrast Pattern-Based Classification for Imbalanced Data , 2018 .