MOSCFRA: A Multi-objective Genetic Approach for Simultaneous Clustering and Gene Ranking

Microarray experiments generate a large amount of data which is used to discover the genetic background of diseases and to know the characteristics of genes. Clustering the tissue samples according to their co-expressed behavior and characteristics is an important tool for partitioning the dataset. Finding the clusters of a given dataset is a difficult task. This task of clustering is even more difficult when we try to find the rank of each gene, which is known as Gene Ranking, according to their abilities to distinguish different classes of samples. In the literature, many algorithms are available for sample clustering and gene ranking or selection, separately. A few algorithms are also available for simultaneous clustering and feature selection. In this article, we have proposed a new approach for clustering the samples and ranking the genes, simultaneously. A novel encoding technique for the chromosomes is proposed for this purpose and the work is accompleshed using a multi-objective evolutionary technique. Results have been demonstrated for both artificial and real-life gene expression data sets.

[1]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Subhash Sharma Applied multivariate techniques , 1995 .

[4]  Ujjwal Maulik,et al.  A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA , 2008, IEEE Transactions on Evolutionary Computation.

[5]  Elena Marchiori,et al.  Ensemble Feature Ranking , 2004, PKDD.

[6]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[7]  Michalis Vazirgiannis,et al.  Quality Scheme Assessment in the Clustering Process , 2000, PKDD.

[8]  Sergios Theodoridis,et al.  Pattern Recognition, Third Edition , 2006 .

[9]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  M. Dehmer,et al.  A Systems Approach to Gene Ranking from DNA Microarray Data of Cervical Cancer , 2007 .

[11]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[12]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[13]  Dino Pedreschi,et al.  Knowledge Discovery in Databases: PKDD 2004 , 2004, Lecture Notes in Computer Science.

[14]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[15]  David Corne,et al.  The Pareto archived evolution strategy: a new baseline algorithm for Pareto multiobjective optimisation , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[16]  Ron Shamir,et al.  CLICK and EXPANDER: a system for clustering and visualizing gene expression data , 2003, Bioinform..

[17]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[18]  Leon G. Higley,et al.  Forensic Entomology: An Introduction , 2009 .

[19]  Ujjwal Maulik,et al.  Multiobjective Genetic Algorithm-Based Fuzzy Clustering of Categorical Attributes , 2009, IEEE Transactions on Evolutionary Computation.

[20]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[21]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[22]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[23]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Boudewijn P. F. Lelieveldt,et al.  A new cluster validity index for the fuzzy c-mean , 1998, Pattern Recognit. Lett..

[25]  Xuesong Lu,et al.  Significance of Gene Ranking for Classification of Microarray Samples , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.