Multiobjective Simulated Annealing-Based Clustering of Tissue Samples for Cancer Diagnosis

In the field of pattern recognition, the study of the gene expression profiles of different tissue samples over different experimental conditions has become feasible with the arrival of microarray-based technology. In cancer research, classification of tissue samples is necessary for cancer diagnosis, which can be done with the help of microarray technology. In this paper, we have presented a multiobjective optimization (MOO)-based clustering technique utilizing archived multiobjective simulated annealing(AMOSA) as the underlying optimization strategy for classification of tissue samples from cancer datasets. The presented clustering technique is evaluated for three open source benchmark cancer datasets [Brain tumor dataset, Adult Malignancy, and Small Round Blood Cell Tumors (SRBCT)]. In order to evaluate the quality or goodness of produced clusters, two cluster quality measures viz, adjusted rand index and classification accuracy (% CoA) are calculated. Comparative results of the presented clustering algorithm with ten state-of-the-art existing clustering techniques are shown for three benchmark datasets. Also, we have conducted a statistical significance test called t-test to prove the superiority of our presented MOO-based clustering technique over other clustering techniques. Moreover, significant gene markers have been identified and demonstrated visually from the clustering solutions obtained. In the field of cancer subtype prediction, this study can have important impact.

[1]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[2]  Lingling An,et al.  Dynamic Clustering of Gene Expression , 2012, ISRN bioinformatics.

[3]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[4]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[5]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[6]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[7]  Chin-Teng Lin,et al.  Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems , 2008, BMC Bioinformatics.

[8]  Yunli Wang,et al.  Semi-supervised consensus clustering for gene expression data analysis , 2014, BioData Mining.

[9]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10]  Ching Y. Suen,et al.  Application of majority voting to pattern recognition: an analysis of its behavior and performance , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[11]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[12]  Roger E Bumgarner,et al.  Multiclass classification of microarray data with repeated measurements: application to cancer , 2003, Genome Biology.

[13]  Ujjwal Maulik,et al.  An improved algorithm for clustering gene expression data , 2007, Bioinform..

[14]  Ujjwal Maulik,et al.  Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification , 2010, PloS one.

[15]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[16]  Y. Skaik Understanding and using sensitivity, specificity and predictive values , 2008, Indian journal of ophthalmology.

[17]  Sergio Carmona,et al.  Clustering gene expression data using a diffraction‐inspired framework , 2012, BioMedical Engineering OnLine.

[18]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[19]  Ujjwal Maulik,et al.  A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA , 2008, IEEE Transactions on Evolutionary Computation.

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Yuepeng Wang,et al.  Integrative methods for gene data analysis and knowledge discovery on the case study of KEDRI’s brain gene ontology , 2008 .

[22]  Sanghamitra Bandyopadhyay,et al.  Gene expression data clustering using a multiobjective symmetry based clustering technique , 2013, Comput. Biol. Medicine.

[23]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[25]  Pradipta Maji,et al.  City Block Distance for Identification of Co-expressed MicroRNAs , 2013, SEMCCO.