On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets

When performing a classification task, we may find some data-sets with a different class distribution among their patterns. This problem is known as classification with imbalanced data-sets and it appears in many real application areas. For this reason, it has recently become a relevant topic in the area of Machine Learning. The aim of this work is to improve the behaviour of fuzzy rule based classification systems (FRBCSs) in the framework of imbalanced data-sets by means of a tuning step. Specifically, we adapt the 2-tuples based genetic tuning approach to classification problems showing the good synergy between this method and some FRBCSs. Our empirical results show that the 2-tuples based genetic tuning increases the performance of FRBCSs in all types of imbalanced data. Furthermore, when the initial Rule Base, built by a fuzzy rule learning methodology, obtains a good behaviour in terms of accuracy, we achieve a higher improvement in performance for the whole model when applying the genetic 2-tuples post-processing step. This enhancement is also obtained in the case of cooperation with a preprocessing stage, proving the necessity of rebalancing the training set before the learning phase when dealing with imbalanced data.

[1]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[2]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[3]  Zuhair Bandar,et al.  Genetic tuning of fuzzy inference within fuzzy classifier systems , 2006, Expert Syst. J. Knowl. Eng..

[4]  Francisco Herrera,et al.  Tuning fuzzy logic controllers by genetic algorithms , 1995, Int. J. Approx. Reason..

[5]  Hong Yan,et al.  Fuzzy Algorithms: With Applications to Image Processing and Pattern Recognition , 1996, Advances in Fuzzy Systems - Applications and Theory.

[6]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Francisco Herrera,et al.  Increasing fuzzy rules cooperation based on evolutionary adaptive inference systems: Research Articles , 2007 .

[8]  Mo-Yuen Chow,et al.  Power Distribution Fault Cause Identification With Imbalanced Data Using the Data Mining-Based Fuzzy Classification $E$-Algorithm , 2007, IEEE Transactions on Power Systems.

[9]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[10]  Zhang Lei,et al.  Designing of classifiers based on immune principles and fuzzy rules , 2008, Inf. Sci..

[11]  Jerry M. Mendel,et al.  Generating fuzzy rules by learning from examples , 1992, IEEE Trans. Syst. Man Cybern..

[13]  Francisco Herrera,et al.  A three-stage evolutionary process for learning descriptive and approximate fuzzy-logic-controller knowledge bases from examples , 1997, Int. J. Approx. Reason..

[14]  Marcelo Simoes Introduction to Fuzzy Control , 2003 .

[15]  Kemal Kilic,et al.  Comparison of Different Strategies of Utilizing Fuzzy Clustering in Structure Identification , 2007, Inf. Sci..

[16]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[17]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[18]  Chao-Ton Su,et al.  An Evaluation of the Robustness of MTS for Imbalanced Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[19]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[20]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[21]  Francisco Herrera,et al.  Ten years of genetic fuzzy systems: current framework and new trends , 2004, Fuzzy Sets Syst..

[22]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[23]  Francisco Herrera,et al.  Genetic Fuzzy Systems - Evolutionary Tuning and Learning of Fuzzy Knowledge Bases , 2002, Advances in Fuzzy Systems - Applications and Theory.

[24]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[25]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[26]  F. Herrera,et al.  Accuracy Improvements in Linguistic Fuzzy Modeling , 2003 .

[27]  Yuehwern Yih,et al.  Knowledge acquisition through information granulation for imbalanced data , 2006, Expert Syst. Appl..

[28]  María José del Jesús,et al.  On the influence of an adaptive inference system in fuzzy rule based classification systems for imbalanced data-sets , 2009, Expert Syst. Appl..

[29]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[30]  Jacek M. Zurada,et al.  Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance , 2008, Neural Networks.

[31]  Chun-Chin Hsu,et al.  An information granulation based data mining approach for classifying imbalanced data , 2008, Inf. Sci..

[32]  Charles L. Karr,et al.  Genetic algorithms for fuzzy controllers , 1991 .

[33]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[34]  Shigeo Abe,et al.  A neural-network-based fuzzy classifier , 1995, IEEE Trans. Syst. Man Cybern..

[35]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[36]  Hisao Ishibuchi,et al.  Hybridization of fuzzy GBML approaches for pattern classification problems , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[37]  Francisco Herrera,et al.  Cooperative Evolutionary Learning of Linguistic Fuzzy Rules and Parametric Aggregation Connectors for Mamdani Fuzzy Systems , 2007, IEEE Transactions on Fuzzy Systems.

[38]  Francisco Herrera,et al.  A 2-tuple fuzzy linguistic representation model for computing with words , 2000, IEEE Trans. Fuzzy Syst..

[39]  Zuhair Bandar,et al.  On constructing a fuzzy inference framework using crisp decision trees , 2006, Fuzzy Sets Syst..

[40]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[41]  Jesús Alcalá-Fdez,et al.  Hybrid learning models to get the interpretability–accuracy trade-off in fuzzy modeling , 2006, Soft Comput..

[42]  Jesús Alcalá-Fdez,et al.  Genetic learning of accurate and compact fuzzy rule based systems based on the 2-tuples linguistic representation , 2007, Int. J. Approx. Reason..

[43]  Hewijin Christine Jiau,et al.  Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem , 2006 .

[44]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[45]  Francisco Herrera,et al.  A taxonomy for the crossover operator for real‐coded genetic algorithms: An experimental study , 2003, Int. J. Intell. Syst..

[46]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[47]  Beatrice Lazzerini,et al.  Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets , 2010, Soft Comput..

[48]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[49]  José Salvador Sánchez,et al.  On the k-NN performance in a challenging scenario of imbalance and overlapping , 2008, Pattern Analysis and Applications.

[50]  Hisao Ishibuchi,et al.  Classification and modeling with linguistic information granules - advanced approaches to linguistic data mining , 2004, Advanced information processing.

[51]  Ester Bernadó-Mansilla,et al.  Evolutionary rule-based systems for imbalanced data sets , 2008, Soft Comput..

[52]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[53]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[54]  Francisco Herrera,et al.  A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability , 2009, Soft Comput..

[55]  Larry J. Eshelman,et al.  The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination , 1990, FOGA.

[56]  Georges R. Harik,et al.  Foundations of Genetic Algorithms , 1997 .

[57]  María José del Jesús,et al.  Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets , 2009, Int. J. Approx. Reason..

[58]  Yi-Hung Liu,et al.  Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines , 2007, IEEE Transactions on Neural Networks.

[59]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[60]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[61]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[62]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[63]  Hisao Ishibuchi,et al.  Rule weight specification in fuzzy rule-based classification systems , 2005, IEEE Transactions on Fuzzy Systems.

[64]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[65]  Xiang Peng,et al.  Robust BMPM training based on second-order cone programming and its application in medical diagnosis , 2008, Neural Networks.

[66]  Francisco Herrera,et al.  Increasing fuzzy rules cooperation based on evolutionary adaptive inference systems , 2007, Int. J. Intell. Syst..

[67]  Eghbal G. Mansoori,et al.  Weighting fuzzy classification rules using receiver operating characteristics (ROC) analysis , 2007, Inf. Sci..

[68]  Rafael Alcalá,et al.  Fuzzy Control of HVAC Systems Optimized by Genetic Algorithms , 2003, Applied Intelligence.

[69]  Francisco Herrera,et al.  A genetic learning process for the scaling factors, granularity and contexts of the fuzzy rule-based system data base , 2001, Inf. Sci..

[70]  María José del Jesús,et al.  A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets , 2008, Fuzzy Sets Syst..

[71]  Francisco Herrera,et al.  Genetic fuzzy systems: taxonomy, current research trends and prospects , 2008, Evol. Intell..

[72]  Narasimhan Sundararajan,et al.  Risk-sensitive loss functions for sparse multi-category classification problems , 2008, Inf. Sci..

[73]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[74]  Andreas Bastian,et al.  How to Handle the Flexibility of Linguistic Variables with Applications , 1994, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[75]  Jesús Alcalá-Fdez,et al.  A Proposal for the Genetic Lateral Tuning of Linguistic Fuzzy Systems and Its Interaction With Rule Selection , 2007, IEEE Transactions on Fuzzy Systems.

[76]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[77]  Minqiang Li,et al.  A hybrid coevolutionary algorithm for designing fuzzy classifiers , 2009, Inf. Sci..

[78]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[79]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[80]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..