A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets

In the field of classification problems, we often encounter classes with a very different percentage of patterns between them, classes with a high pattern percentage and classes with a low pattern percentage. These problems receive the name of ''classification problems with imbalanced data-sets''. In this paper we study the behaviour of fuzzy rule based classification systems in the framework of imbalanced data-sets, focusing on the synergy with the preprocessing mechanisms of instances and the configuration of fuzzy rule based classification systems. We will analyse the necessity of applying a preprocessing step to deal with the problem of imbalanced data-sets. Regarding the components of the fuzzy rule base classification system, we are interested in the granularity of the fuzzy partitions, the use of distinct conjunction operators, the application of some approaches to compute the rule weights and the use of different fuzzy reasoning methods.

[1]  Hisao Ishibuchi,et al.  Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining , 2004, Fuzzy Sets Syst..

[2]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[3]  Jerry M. Mendel,et al.  Generating fuzzy rules by learning from examples , 1992, IEEE Trans. Syst. Man Cybern..

[4]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[5]  Hisao Ishibuchi,et al.  Classification and modeling with linguistic information granules - advanced approaches to linguistic data mining , 2004, Advanced information processing.

[6]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[7]  Francisco Herrera,et al.  Generating the knowledge base of a fuzzy rule-based system by the genetic learning of the data base , 2001, IEEE Trans. Fuzzy Syst..

[8]  Hisao Ishibuchi,et al.  Comparison of Heuristic Criteria for Fuzzy Rule Selection in Classification Problems , 2004, Fuzzy Optim. Decis. Mak..

[9]  David E. Goldberg,et al.  Substructural Surrogates for Learning Decomposable Classification Problems , 2008, IWLCS.

[10]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[11]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[12]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[13]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[14]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[15]  Anca L. Ralescu,et al.  Fuzzy classifiers for imbalanced data sets , 2007 .

[16]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[17]  Eghbal G. Mansoori,et al.  A weighting function for improving fuzzy classification systems performance , 2007, Fuzzy Sets Syst..

[18]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[19]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[20]  Hisao Ishibuchi,et al.  Rule weight specification in fuzzy rule-based classification systems , 2005, IEEE Transactions on Fuzzy Systems.

[21]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[22]  María José del Jesús,et al.  A proposal on reasoning methods in fuzzy rule-based classification systems , 1999, Int. J. Approx. Reason..

[23]  Jesús Cerquides,et al.  Imbalanced Datasets Classification by Fuzzy Rule Extraction and Genetic Algorithms , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[24]  Haym Hirsh,et al.  A Quantitative Study of Small Disjuncts , 2000, AAAI/IAAI.

[25]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[26]  David E. Goldberg,et al.  Substructrual surrogates for learning decomposable classification problems: implementation and first results , 2007, GECCO '07.

[27]  Ling Zhuang,et al.  A Novel Field Learning Algorithm for Dual Imbalance Text Classification , 2005, FSKD.

[28]  Jerzy W. Grzymala-Busse,et al.  Increasing sensitivity of preterm birth by changing rule strengths , 2003, Pattern Recognit. Lett..

[29]  Hong Yan,et al.  Fuzzy Algorithms: With Applications to Image Processing and Pattern Recognition , 1996, Advances in Fuzzy Systems - Applications and Theory.

[30]  Lotfi A. Zadeh,et al.  Fuzzy sets and systems , 1990 .

[31]  Francisco Herrera,et al.  Linguistic modeling by hierarchical systems of linguistic rules , 2002, IEEE Trans. Fuzzy Syst..

[32]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[33]  Giorgio Valentini,et al.  Support vector machines for candidate nodules classification , 2005, Neurocomputing.

[34]  Mo-Yuen Chow,et al.  Power Distribution Fault Cause Identification With Imbalanced Data Using the Data Mining-Based Fuzzy Classification $E$-Algorithm , 2007, IEEE Transactions on Power Systems.

[35]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[36]  Sofia Visa,et al.  Fuzzy Classifiers for Imbalanced , Complex Classes of Varying Size , 2005 .

[37]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[38]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[39]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[40]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[41]  Hisao Ishibuchi,et al.  Effect of rule weights in fuzzy rule-based classification systems , 2000, Ninth IEEE International Conference on Fuzzy Systems. FUZZ- IEEE 2000 (Cat. No.00CH37063).

[42]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[43]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[44]  Adam Kowalczyk,et al.  Extreme re-balancing for SVMs: a case study , 2004, SKDD.

[45]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[46]  Anca L. Ralescu,et al.  The Effect of Imbalanced Data Class Distribution on Fuzzy Classifiers - Experimental Study , 2005, The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ '05..

[47]  Zuhair Bandar,et al.  On Producing Balanced Fuzzy Decision Tree Classifiers , 2006, 2006 IEEE International Conference on Fuzzy Systems.

[48]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[49]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[50]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[51]  Shigeo Abe,et al.  A neural-network-based fuzzy classifier , 1995, IEEE Trans. Syst. Man Cybern..

[52]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[53]  Constantin V. Negoita,et al.  On Fuzzy Systems , 1978 .

[54]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[55]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.