Genetic fuzzy discretization with adaptive intervals for classification problems

We propose a genetic fuzzy discretization method for continuous numerical attributes. Traditional discretization methods categorize the continuous attributes into a number of bins. Because they are made on crisp discretization, there exists considerable information loss. Fuzzy discretization allows overlapping intervals and reflects linguistic classification. However, the number of intervals, the boundaries of intervals, and the degrees of overlapping are intractable to get optimized. We use a genetic algorithm to optimize these parameters. Experimental results showed considerable improvement on the classification accuracy over a crisp discretization and a typical fuzzy discretization.

[1]  Ivan Bruha,et al.  Discretization and Grouping: Preprocessing Steps for Data Mining , 1998, PKDD.

[2]  Shyi-Ming Chen,et al.  Interval-valued fuzzy hypergraph and fuzzy partition , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[3]  Andrew Kusiak,et al.  Feature transformation methods in data mining , 2001 .

[4]  R. Tibshirani,et al.  Cross-Validation and the Bootstrap : Estimating the Error Rate ofa Prediction , 1995 .

[5]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[6]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[7]  Petra Perner,et al.  A COMPARISION OF DIFFERENT MULTI- INTERVAL DISCRETIZATION METHODS FOR DECISION TREE LEARNING , 2007 .

[8]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[9]  Hisao Ishibuchi,et al.  Fuzzy Rule Selection By Data Mining Criteria And Genetic Algorithms , 2002, GECCO.

[10]  D.W. Stashuk,et al.  Probabilistic inference-based classification applied to myoelectric signal decomposition , 1992, IEEE Transactions on Biomedical Engineering.

[11]  Hisao Ishibuchi,et al.  Fuzzy data mining: effect of fuzzy discretization , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[12]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Tapio Elomaa,et al.  General and Efficient Multisplitting of Numerical Attributes , 1999, Machine Learning.

[15]  H. Ishibuchi,et al.  Adjusting fuzzy partitions by genetic algorithms and histograms for pattern classification problems , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[16]  Keki B. Irani,et al.  Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.

[17]  Jason Catlett,et al.  On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[18]  Hisao Ishibuchi,et al.  Deriving fuzzy discretization from interval discretization , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[19]  Byung Ro Moon,et al.  Genetic Algorithm and Graph Partitioning , 1996, IEEE Trans. Computers.

[20]  Andrew K. C. Wong,et al.  Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[22]  Andrew K. C. Wong,et al.  Information Discovery through Hierarchical Maximum Entropy Discretization and Synthesis , 1991, Knowledge Discovery in Databases.