Simultaneous feature selection and clustering with mixed features by multi objective genetic algorithm

In this paper, we propose a novel evolutionary clustering algorithm for mixed type data numerical and categorical. It is doing clustering and feature selection simultaneously. Feature subset selection improves quality of clustering. It also improves understandability and scalability. It unfastens attraction on numerical or categorical dataset only. K-prototype KP is a well-known partitional clustering algorithm for mixed type data. However, this type of algorithm is sensitive to initialization and may converge to local optima. It is optimizing a single measure only i.e. minimizations of intra cluster distance. We have considered clustering as a multi objective optimization problem MOOP. Minimization of intra cluster distance and maximization of inter cluster distance are two objectives of optimization. Multi objective genetic algorithm MOGA is a well-known algorithm which can be applicable for MOOP to find out near global optimal solution. So in this paper we have developed a hybridized genetic clustering algorithm by combining the global search ability of MOGA and local search ability of KP. Experiments on real-life benchmark datasets from UCI machine learning repository prove the superiority of the proposed algorithm.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[3]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[4]  J. Wu,et al.  A genetic fuzzy k-Modes algorithm for clustering categorical data , 2009, Expert Syst. Appl..

[5]  Mohammad Reza Meybodi,et al.  A fuzzy co-clustering approach for hybrid recommender systems , 2013, Int. J. Hybrid Intell. Syst..

[6]  Padhraic Smyth,et al.  Clustering Using Monte Carlo Cross-Validation , 1996, KDD.

[7]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[8]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[9]  Nazmul H. Siddique,et al.  Evolutionary multi-objective clustering for overlapping clusters detection , 2009, 2009 IEEE Congress on Evolutionary Computation.

[10]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[11]  Joshua D. Knowles,et al.  Evolutionary Multiobjective Clustering , 2004, PPSN.

[12]  Jaya Sil,et al.  Data clustering with mixed features by multi objective genetic algorithm , 2012, 2012 12th International Conference on Hybrid Intelligent Systems (HIS).

[13]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[14]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[15]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[16]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[17]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[18]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[19]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Multi-Objective Clustering Ensemble , 2006, 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06).

[20]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[21]  Sam Kwong,et al.  Multi-Objective Data Clustering using Variable-Length Real Jumping Genes Genetic Algorithm and Local Search Method , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[22]  Andreas Zell,et al.  Clustering Gene Expression Profiles with Memetic Algorithms , 2002, PPSN.

[23]  Ingo Mierswa,et al.  Information preserving multi-objective feature selection for unsupervised learning , 2006, GECCO.

[24]  Jun Du,et al.  Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering , 2006, Intell. Data Anal..

[25]  Ujjwal Maulik,et al.  Multiobjective Genetic Algorithm-Based Fuzzy Clustering of Categorical Attributes , 2009, IEEE Transactions on Evolutionary Computation.

[26]  Zengyou He,et al.  G-ANMI: A mutual information based genetic clustering algorithm for categorical data , 2010, Knowl. Based Syst..

[27]  Joshua D. Knowles,et al.  Exploiting the Trade-off - The Benefits of Multiple Objectives in Data Clustering , 2005, EMO.

[28]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[29]  J. Sil,et al.  Simultaneous continuous feature selection and K clustering by Multi Objective Genetic Algorithm , 2013, 2013 3rd IEEE International Advance Computing Conference (IACC).

[30]  Ujjwal Maulik,et al.  Multiobjective Genetic Fuzzy Clustering of Categorical Attributes , 2007 .

[31]  Joshua D. Knowles,et al.  Multi-Objective Clustering and Cluster Validation , 2006, Multi-Objective Machine Learning.

[32]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[33]  Estevam R. Hruschka,et al.  A Bayesian imputation method for a clustering genetic algorithm , 2011, J. Comput. Methods Sci. Eng..

[34]  Flávio Bortolozzi,et al.  Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[35]  J. Sil,et al.  Clustering data set with categorical feature using multi objective genetic algorithm , 2012, 2012 International Conference on Data Science & Engineering (ICDSE).

[36]  Filippo Menczer,et al.  Evolutionary model selection in unsupervised learning , 2002, Intell. Data Anal..

[37]  C. A. Murthy,et al.  Genetic Algorithm with Elitist Model and Its Convergence , 1996, Int. J. Pattern Recognit. Artif. Intell..

[38]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[39]  Vilfredo Pareto,et al.  Manuale di economia politica , 1965 .

[40]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[41]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[42]  Brian Everitt,et al.  Cluster analysis , 1974 .

[43]  L. Hubert,et al.  Quadratic assignment as a general data analysis strategy. , 1976 .

[44]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[45]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[46]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[47]  Ingo Mierswa,et al.  Sound Multi-objective Feature Space Transformation for Clustering , 2006, LWA.

[48]  Paul Scheunders,et al.  A genetic c-Means clustering algorithm applied to color image quantization , 1997, Pattern Recognit..

[49]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[50]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[51]  Ujjwal Maulik,et al.  Integrating Clustering and Supervised Learning for Categorical Data Analysis , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[52]  Nelson F. F. Ebecken,et al.  A genetic algorithm for cluster analysis , 2003, Intell. Data Anal..

[53]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Enrique H. Ruspini,et al.  Numerical methods for fuzzy clustering , 1970, Inf. Sci..

[55]  Olli Nevalainen,et al.  Genetic Algorithms for Large-Scale Clustering Problems , 1997, Comput. J..

[56]  H. Fawcett Manual of Political Economy , 1995 .

[57]  Jaya Sil,et al.  Clustering by multi objective genetic algorithm , 2012, 2012 1st International Conference on Recent Advances in Information Technology (RAIT).

[58]  Anil K. Jain,et al.  Multiobjective data clustering , 2004, CVPR 2004.

[59]  Jaya Sil,et al.  Simultaneous feature selection and clustering for categorical features using multi objective genetic algorithm , 2012, 2012 12th International Conference on Hybrid Intelligent Systems (HIS).

[60]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[61]  V. J. Rayward-Smith,et al.  Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition , 1999 .

[62]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[63]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[64]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[65]  Jun Zhu,et al.  Genetic Algorithms Applied to Multi-Class Clustering for Gene Expression Data , 2003, Genomics, proteomics & bioinformatics.

[66]  Sam Kwong,et al.  Multi-Objective Evolutionary Clustering using Variable-Length Real Jumping Genes Genetic Algorithm , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[67]  Santanu Kumar Rath,et al.  Comparison of SGA and RGA based Clustering Algorithm for Pattern Recognition , 2009 .

[68]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[69]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[70]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[71]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.