Extracting a Fuzzy System by Using Genetic Algorithms for Imbalanced Datasets Classification: Application on Down's Syndrome Detection

This chapter presents a new methodology to extract a Fuzzy System by using Genetic Algorithms for the classification of imbalanced datasets when the intelligibility of the Fuzzy Rules is an issue. We propose a method for fuzzy variable construction, based on modifying the set of fuzzy variables obtained by the DDA/RecBF clustering algorithm. Afterwards, these variables are recombined to obtain Fuzzy Rules by means of a Genetic Algorithm. The method has been developed for the prenatal Down’s syndrome detection during the secondtrimester of pregnancy. We present empirical results showing its accuracy for this task. Furthermore, we provide more generic experimental results over UCI datasets proving that the method can have a wider applicability on imbalanced datasets.

[1]  Jerry M. Mendel,et al.  Generating fuzzy rules by learning from examples , 1992, IEEE Trans. Syst. Man Cybern..

[2]  Stan Matwin,et al.  Learning When Negative Examples Abound , 1997, ECML.

[3]  Vojislav Kecman,et al.  Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models , 2001 .

[4]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[5]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[6]  Cesare Furlanello,et al.  Automatic model selection in cost-sensitive boosting , 2003, Inf. Fusion.

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[9]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10]  Michael R. Berthold,et al.  Constructing fuzzy graphs from examples , 1999, Intell. Data Anal..

[11]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[12]  Gerhard Widmer,et al.  Machine Learning: ECML-97 , 1997, Lecture Notes in Computer Science.

[13]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[14]  Ann Tabor,et al.  Maternal serum markers in screening for Down syndrome , 1990, Clinical genetics.

[15]  Shyi-Ming Chen,et al.  A new method for constructing membership functions and fuzzy rules from training examples , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[16]  Jianping Zhang,et al.  Learning rules from highly unbalanced data sets , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[17]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  M. Prim,et al.  Adapting Fuzzy Points for Very-Imbalanced Datasets , 2006, NAFIPS 2006 - 2006 Annual Meeting of the North American Fuzzy Information Processing Society.

[20]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[21]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[22]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[23]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[24]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.