Influence of clustering pre-processing on genetically generated fuzzy knowledge bases

Automatic knowledge base generation using techniques such as genetic algorithms tend to be highly dependent on the quality and size of the learning data. First of all, large data sets can lead to unnecessary time loss, when smaller data sets could describe the problem as well. Second of all, the presence of noise and outliers can cause the learning algorithm to degenerate. Clustering techniques allow compressing and filtering the data, thus making the generation of fuzzy knowledge bases faster and more accurate. Different clustering algorithms are compared and the validation of the results through a theoretical 3D surface, shows that when compressing the data to 5% of its original size, clustering algorithms accelerate the learning process by up to 94%. Moreover, when the learning data contains noise and/or a large amount of outliers, clustering algorithms can make the results more stable and improve the fitness of the obtained FKBs.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  E. Lehmann,et al.  Nonparametrics: Statistical Methods Based on Ranks , 1976 .

[3]  M. Kendall Rank Correlation Methods , 1949 .

[4]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[5]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[6]  Francisco Herrera,et al.  Gradual distributed real-coded genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[7]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[8]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[11]  M. Balazinski,et al.  Real/Binary-Like Coded Genetic Algorithm to Automatically Generate Fuzzy Knowledge Bases , 2003, 2003 4th International Conference on Control and Automation Proceedings.

[12]  Lotfi A. Zadeh,et al.  Outline of a New Approach to the Analysis of Complex Systems and Decision Processes , 1973, IEEE Trans. Syst. Man Cybern..

[13]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[14]  David E. Goldberg,et al.  Time Complexity of genetic algorithms on exponentially scaled problems , 2000, GECCO.

[15]  H. Hancock Development of the Minkowski Geometry of Numbers , 1939 .

[16]  E. Czogala,et al.  Application of fuzzy logic techniques to the selection of cutting parameters in machining processes , 1994 .

[17]  Satoru Miyano,et al.  The C Clustering Library , 2005 .

[18]  Sofiane Achiche,et al.  Multi-combinative strategy to avoid premature convergence in genetically-generated fuzzy knowledge bases , 2004 .

[19]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.