Categorical Feature Reduction Using Multi Objective Genetic Algorithm in Cluster Analysis

In the paper, real coded multi objective genetic algorithm based K-clustering method has been studied, K represents the number of clusters. In K-clustering algorithm value of K is known. The searching power of Genetic Algorithm (GA) is exploited to search for suitable clusters and centers of clusters so that intra-cluster distance (Homogeneity, H) and inter-cluster distances (Separation, S) are simultaneously optimized. It is achieved by measuring H and S using Mod distance per feature metric, suitable for categorical features (attributes). We have selected 3 benchmark data sets from UCI Machine Learning Repository containing categorical features only.

[1]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[2]  Huaiqing Wang,et al.  A discretization algorithm based on a heterogeneity criterion , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[4]  Nelson F. F. Ebecken,et al.  A genetic algorithm for cluster analysis , 2003, Intell. Data Anal..

[5]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Sam Kwong,et al.  Multi-Objective Data Clustering using Variable-Length Real Jumping Genes Genetic Algorithm and Local Search Method , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[7]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[8]  Paul Scheunders,et al.  A genetic c-Means clustering algorithm applied to color image quantization , 1997, Pattern Recognit..

[9]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[11]  Olli Nevalainen,et al.  Genetic Algorithms for Large-Scale Clustering Problems , 1997, Comput. J..

[12]  H. Fawcett Manual of Political Economy , 1995 .

[13]  Jaya Sil,et al.  Clustering by multi objective genetic algorithm , 2012, 2012 1st International Conference on Recent Advances in Information Technology (RAIT).

[14]  Juan Julián Merelo Guervós,et al.  Parallel Problem Solving from Nature - PPSN IX: 9th International Conference, Reykjavik, Iceland, September 9-13, 2006, Proceedings , 2006, PPSN.

[15]  Jaya Sil,et al.  Simultaneous feature selection and clustering for categorical features using multi objective genetic algorithm , 2012, 2012 12th International Conference on Hybrid Intelligent Systems (HIS).

[16]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[17]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[18]  David W. Corne,et al.  Approximating the Nondominated Front Using the Pareto Archived Evolution Strategy , 2000, Evolutionary Computation.

[19]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[20]  Vincenzo Catania,et al.  Multi-Objective Evolutionary Fuzzy Clustering for High-Dimensional Problems , 2007, 2007 IEEE International Fuzzy Systems Conference.

[21]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[22]  Sam Kwong,et al.  Multi-Objective Evolutionary Clustering using Variable-Length Real Jumping Genes Genetic Algorithm , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[23]  Sanghamitra Bandyopadhyay,et al.  Multiobjective GAs, quantitative indices, and pattern classification , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[24]  C. A. Murthy,et al.  Genetic Algorithm with Elitist Model and Its Convergence , 1996, Int. J. Pattern Recognit. Artif. Intell..

[25]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[26]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[27]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[28]  Joshua D. Knowles,et al.  Evolutionary Multiobjective Clustering , 2004, PPSN.

[29]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Multi-Objective Clustering Ensemble , 2006, 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06).

[30]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[31]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[32]  Jun Du,et al.  Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering , 2006, Intell. Data Anal..

[33]  Ujjwal Maulik,et al.  Multiobjective Genetic Algorithm-Based Fuzzy Clustering of Categorical Attributes , 2009, IEEE Transactions on Evolutionary Computation.

[34]  Ujjwal Maulik,et al.  Integrating Clustering and Supervised Learning for Categorical Data Analysis , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[35]  Ian Witten,et al.  Data Mining , 2000 .

[36]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[37]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[38]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[39]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[40]  Joshua D. Knowles,et al.  Multi-Objective Clustering and Cluster Validation , 2006, Multi-Objective Machine Learning.

[41]  Jaime G. Carbonell,et al.  An Overview of Machine Learning , 1983 .

[42]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[43]  Vilfredo Pareto,et al.  Manuale di economia politica , 1965 .

[44]  Flávio Bortolozzi,et al.  Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[45]  J. Sil,et al.  Clustering data set with categorical feature using multi objective genetic algorithm , 2012, 2012 International Conference on Data Science & Engineering (ICDSE).

[46]  Ian C. Parmee,et al.  Multi-objective Optimisation and Preliminary Airframe Design , 1998 .

[47]  Anil K. Jain,et al.  Multiobjective data clustering , 2004, CVPR 2004.

[48]  Durga Prasad Mohapatra,et al.  A Node-Marking Technique for Dynamic Slicing of Aspect-Oriented Programs , 2007 .

[49]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[50]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[51]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[52]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[53]  S. Ranjithan,et al.  Using genetic algorithms to solve a multiple objective groundwater pollution containment problem , 1994 .

[54]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[55]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[56]  Jun Zhu,et al.  Genetic Algorithms Applied to Multi-Class Clustering for Gene Expression Data , 2003, Genomics, proteomics & bioinformatics.

[57]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[58]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[59]  David E. Goldberg,et al.  A niched Pareto genetic algorithm for multiobjective optimization , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[60]  Ingo Mierswa,et al.  Sound Multi-objective Feature Space Transformation for Clustering , 2006, LWA.

[61]  Douglas H. Fisher,et al.  Improving Inference through Conceptual Clustering , 1987, AAAI.

[62]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[63]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[64]  J. Sil,et al.  Simultaneous continuous feature selection and K clustering by Multi Objective Genetic Algorithm , 2013, 2013 3rd IEEE International Advance Computing Conference (IACC).

[65]  Ujjwal Maulik,et al.  Multiobjective Genetic Fuzzy Clustering of Categorical Attributes , 2007 .

[66]  G. W. Stewart,et al.  On the Early History of the Singular Value Decomposition , 1993, SIAM Rev..

[67]  Mohammad Reza Meybodi,et al.  A fuzzy co-clustering approach for hybrid recommender systems , 2013, Int. J. Hybrid Intell. Syst..

[68]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[69]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[70]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[71]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[72]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[73]  Subbarao Kambhampati,et al.  Evolutionary Computing , 1997, Lecture Notes in Computer Science.

[74]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[75]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[76]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[77]  Andreas Zell,et al.  Clustering Gene Expression Profiles with Memetic Algorithms , 2002, PPSN.

[78]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[79]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[80]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[81]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[82]  Eckart Zitzler,et al.  Evolutionary algorithms for multiobjective optimization: methods and applications , 1999 .

[83]  Joshua D. Knowles,et al.  Exploiting the Trade-off - The Benefits of Multiple Objectives in Data Clustering , 2005, EMO.

[84]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[85]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[86]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[87]  Nazmul H. Siddique,et al.  Evolutionary multi-objective clustering for overlapping clusters detection , 2009, 2009 IEEE Congress on Evolutionary Computation.

[88]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[89]  Ingo Mierswa,et al.  Information preserving multi-objective feature selection for unsupervised learning , 2006, GECCO.

[90]  Zengyou He,et al.  G-ANMI: A mutual information based genetic clustering algorithm for categorical data , 2010, Knowl. Based Syst..

[91]  Sankar K. Pal,et al.  Data mining in soft computing framework: a survey , 2002, IEEE Trans. Neural Networks.

[92]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[93]  L. Hubert,et al.  Quadratic assignment as a general data analysis strategy. , 1976 .

[94]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[95]  J. Wu,et al.  A genetic fuzzy k-Modes algorithm for clustering categorical data , 2009, Expert Syst. Appl..

[96]  Jaya Sil,et al.  Data clustering with mixed features by multi objective genetic algorithm , 2012, 2012 12th International Conference on Hybrid Intelligent Systems (HIS).

[97]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[98]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[99]  Filippo Menczer,et al.  Evolutionary model selection in unsupervised learning , 2002, Intell. Data Anal..

[100]  Geoffrey E. Hinton,et al.  Unsupervised learning : foundations of neural computation , 1999 .

[101]  Santanu Kumar Rath,et al.  Comparison of SGA and RGA based Clustering Algorithm for Pattern Recognition , 2009 .

[102]  Byeong Man Kim,et al.  Clustering approach for hybrid recommender system , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[103]  Patrick D. Surry,et al.  A Multi-objective Approach to Constrained Optimisation of Gas Supply Networks: the COMOGA Method , 1995, Evolutionary Computing, AISB Workshop.