Automatic clustering and feature selection using gravitational search algorithm and its application to microarray data analysis

This paper proposes a novel approach that selects the number of clusters along with relevant features automatically and simultaneously. Gravitational search algorithm is used as metaheuristic. A novel agent representation scheme is used for encoding cluster centers and number of features. The algorithm is able to find the optimal number of clusters and the relevant features corresponding to the clusters during the run time. A new concept of threshold setting is used. The variance (statistical property) of the dataset has been exploited. To make the search efficient, a novel clustering criterion is used. The proposed approach is compared with recently developed well-known clustering techniques. This approach is further applied for analysis of microarray data. The statistical and biological significance tests are performed to demonstrate the efficiency of proposed approach. The results prove the effectiveness and the accuracy of the proposed algorithm.

[1]  Liang Du,et al.  Joint Clustering and Feature Selection , 2013, WAIM.

[2]  D. Wolfe,et al.  Nonparametric Statistical Methods. , 1974 .

[3]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[4]  Martha Mendoza,et al.  A harmony search algorithm for clustering with feature selection , 2010 .

[5]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[6]  Hossein Nezamabadi-pour,et al.  BGSA: binary gravitational search algorithm , 2010, Natural Computing.

[7]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Adem Karahoca,et al.  Simultaneous feature selection and ant colony clustering , 2011, WCIT.

[9]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Dinesh Kumar,et al.  Clustering using modified harmony search algorithm , 2014, Int. J. Comput. Intell. Stud..

[11]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[12]  Zhaohui S. Qin,et al.  Clustering microarray gene expression data using weighted Chinese restaurant process , 2006, Bioinform..

[13]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[14]  Yogesh R. Shepal A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data , 2014 .

[15]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[16]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[17]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[18]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[19]  K. Faez,et al.  Clustering and feature selection via PSO algorithm , 2011, 2011 International Symposium on Artificial Intelligence and Signal Processing (AISP).

[20]  Michael I. Jordan,et al.  A Unified Probabilistic Model for Global and Local Unsupervised Feature Selection , 2011, ICML.

[21]  Jitender Kumar Chhabra,et al.  An automated parameter selection approach for simultaneous clustering and feature selection , 2016 .

[22]  Amit Konar,et al.  Automatic image pixel clustering with an improved differential evolution , 2009, Appl. Soft Comput..

[23]  Ujjwal Maulik,et al.  Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification , 2003, IEEE Trans. Geosci. Remote. Sens..

[24]  Pascal Nsoh,et al.  Large-scale temporal gene expression mapping of central nervous system development , 2007 .

[25]  S. Mirjalili,et al.  A new hybrid PSOGSA algorithm for function optimization , 2010, 2010 International Conference on Computer and Information Application.

[26]  Yiu-ming Cheung,et al.  A new feature selection method for Gaussian mixture clustering , 2009, Pattern Recognit..

[27]  Hichem Frigui,et al.  Simultaneous clustering and attribute discrimination , 2000, Ninth IEEE International Conference on Fuzzy Systems. FUZZ- IEEE 2000 (Cat. No.00CH37063).

[28]  Shivakumar Vaithyanathan,et al.  Generalized Model Selection for Unsupervised Learning in High Dimensions , 1999, NIPS.

[29]  Anima Naik,et al.  Efficient Clustering of Dataset Based on Differential Evolution , 2013, FICTA.

[30]  G. Celeux,et al.  Variable Selection for Clustering with Gaussian Mixture Models , 2009, Biometrics.

[31]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[32]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[33]  V. Susheela Devi,et al.  Simultaneous Feature Selection and Clustering Using Particle Swarm Optimization , 2012, ICONIP.

[34]  Henri Luchian,et al.  A unifying criterion for unsupervised clustering and feature selection , 2011, Pattern Recognit..

[35]  Hadi Sarvari,et al.  Harmony search algorithm for simultaneous clustering and feature selection , 2010, 2010 International Conference of Soft Computing and Pattern Recognition.

[36]  Weiguo Sheng,et al.  A Niching Memetic Algorithm for Simultaneous Clustering and Feature Selection , 2008, IEEE Transactions on Knowledge and Data Engineering.

[37]  Sanghamitra Bandyopadhyay,et al.  Gene expression data clustering using a multiobjective symmetry based clustering technique , 2013, Comput. Biol. Medicine.

[38]  Hossein Nezamabadi-pour,et al.  GSA: A Gravitational Search Algorithm , 2009, Inf. Sci..

[39]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[40]  Ujjwal Maulik,et al.  An improved algorithm for clustering gene expression data , 2007, Bioinform..