Empirical Study of Associative Classifiers on Imbalanced Datasets in KEEL

This article presents the empirical performance analysis of the Associative Classification approaches on imbalanced datasets. The imbalanced dataset is a dataset in which ratio of an instance of one class drastically differs from the other one. The ratio difference in class instances, imbalanced dataset, highly affects the performance of the classifiers. An associative Classification is a hybrid technique which combines the classification rules discovery and association rules discovery both are important tasks of Knowledge Discovery. We investigate the performance of selective associative classifiers namely CBA, CBA2, CMAR-C, CPAR-C, and Fuzzy- FARCHD-C by using the methods implemented in KEEL data mining tool on public imbalanced datasets. The experimental results show that the performance of the Fuzzy-FARCHD-C is promising with respect other methods in terms of accuracy.

[1]  Zulfiqar Ali,et al.  EPACO: a novel ant colony optimization for emerging patterns based classification , 2018, Cluster Computing.

[2]  Lizhu Zhou,et al.  Integrating Classification and Association Rule Mining: A Concept Lattice Framework , 1999, RSFDGrC.

[3]  Jesús Alcalá-Fdez,et al.  A Fuzzy Association Rule-Based Classification Model for High-Dimensional Problems With Genetic Rule Selection and Lateral Tuning , 2011, IEEE Transactions on Fuzzy Systems.

[4]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[5]  Abdul Rauf Baig,et al.  A correlation-based ant miner for classification rule discovery , 2012, Neural Computing and Applications.

[6]  Om Prakash Vyas,et al.  Using Associative Classifiers for Predictive Analysis in Health Care Data Mining , 2010 .

[7]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[8]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[9]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[10]  Sven F. Crone,et al.  Instance sampling in credit scoring: An empirical study of sample size and balancing , 2012 .

[11]  Bing Liu,et al.  Classification Using Association Rules: Weaknesses and Enhancements , 2001 .

[12]  Chun Gui,et al.  Analysis of imbalanced data set problem: The case of churn prediction for telecommunication , 2017, Artif. Intell. Res..

[13]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[14]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[15]  Phayung Meesad,et al.  A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition , 2014, Expert Syst. Appl..

[16]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[17]  Fadi A. Thabtah,et al.  A review of associative classification mining , 2007, The Knowledge Engineering Review.

[18]  金鹏,et al.  Classification rule mining based on ant colony optimization algorithm , 2006 .

[19]  Samira Sadaoui,et al.  An Empirical Analysis of Imbalanced Data Classification , 2015, Comput. Inf. Sci..

[20]  A. R. Baig,et al.  HYBRID ASSOCIATIVE CLASSIFICATION ALGORITHM USING ANT COLONY OPTIMIZATION , 2011 .

[21]  WASEEM SHAHZAD,et al.  Compatibility as a Heuristic for Construction of Rules by Artificial Ants , 2010, J. Circuits Syst. Comput..

[22]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.