Multi-objective clustering of tissue samples for cancer diagnosis

In the field of pattern recognition, the study of the gene expression profiles for different tissue samples over different experimental conditions has became feasible with the arrival of micro-array based technology. In cancer research, classification of tissue samples is necessary for cancer diagnosis, which can be done with the help of micro-array technology. In this article we have presented a multi-objective optimization ( MOO ) based clustering technique utilizing AMOSA ( Archived Multi-Objective Simulated Annealing ) as the underlying optimization strategy for classification of tissue samples from cancer data sets. As objective functions three cluster validity indices namely, XB, PBM, and FCM indices are optimized simultaneously to form more accurate clusters of tissue samples. The presented clustering technique is evaluated for two open source benchmark cancer data sets, which are Brain tumor data set and Adult Malignancy data set. In order to evaluate the quality or goodness of produced clusters two cluster quality measures viz, Adjusted Rand Index ( ARI ) and Classification Accuracy ( %CoA ) are calculated for each data set. Comparative results of the presented clustering algorithm with 10 state-of-the-art existing single-objective, multi-objective based clustering algorithms are shown for two benchmark data sets.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[3]  W. L. Ruzzo,et al.  An empirical study on Principal Component Analysis for clustering gene expression data , 2000 .

[4]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[5]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[6]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[7]  Roger E Bumgarner,et al.  Correction: Multiclass classification of microarray data with repeated measurements: application to cancer , 2006, Genome Biology.

[8]  Lingling An,et al.  Dynamic Clustering of Gene Expression , 2012, ISRN bioinformatics.

[9]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[10]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[11]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[12]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[13]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[14]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Sergio Carmona,et al.  Clustering gene expression data using a diffraction‐inspired framework , 2012, BioMedical Engineering OnLine.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[20]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[21]  Ujjwal Maulik,et al.  Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification , 2003, IEEE Trans. Geosci. Remote. Sens..

[22]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[23]  Sanghamitra Bandyopadhyay,et al.  Unsupervised Classification: Similarity Measures, Classical and Metaheuristic Approaches, and Applications , 2012 .

[24]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[25]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[26]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[27]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[28]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[29]  Yunli Wang,et al.  Semi-supervised consensus clustering for gene expression data analysis , 2014, BioData Mining.

[30]  James C. Bezdek,et al.  Fuzzy mathematics in pattern classification , 1973 .

[31]  Ujjwal Maulik,et al.  A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA , 2008, IEEE Transactions on Evolutionary Computation.

[32]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[33]  Ujjwal Maulik,et al.  An improved algorithm for clustering gene expression data , 2007, Bioinform..

[34]  Ujjwal Maulik,et al.  Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification , 2010, PloS one.

[35]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Christian Callegari,et al.  Advances in Computing, Communications and Informatics (ICACCI) , 2015 .