Robust fuzzy clustering techniques for analyzing complicated colon cancer database

Identifying subgroups of genes from the gene expression of microarray high-dimensionality database is useful in discovering subtypes of cancers in Colon cancer database. Using clustering analysis for identifying cancer types in Colon cancer database is an extremely difficult task because of high-dimensionality gene with noise. Most of the existing clustering methods for colon to achieve types of cancers often hamper the interpretability of the structure. Therefore the aim of this paper is to develop suitable clustering techniques based on fuzzy c-means, the typicality of possibilistic c-means approaches, kernel functions, and neighborhood term to identify similar characters of genes and samples for getting cancer subtypes in the colon cancer database. In order to avoid the random selection of initial prototypes of fuzzy clustering based techniques, this paper presents an algorithm to initialize the cluster prototypes. The performance of proposed methods has been evaluated through experimental work on Synthetic dataset, Wine dataset, IRIS dataset, Checkerboard, Time series, and Thyroid dataset. This paper successfully implements the proposed methods in finding subtypes of cancers in Colon cancer database. Compared with the results of recent existed clustering methods on benchmark datasets and Colon cancer database, this paper has shown that the proposed clustering approach can identify more similar objects of the subgroups than the existed methods. The superiority of the proposed methods has been proved through clustering accuracy.

[1]  Siti Zaiton Mohd Hashim,et al.  Triangular Kernel Nearest-Neighbor-Based Clustering Algorithm for Discovering True Clusters , 2012, PAKDD Workshops.

[2]  P. Brown,et al.  A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. , 1996, Genome research.

[3]  Andrea Schenone,et al.  A fuzzy clustering based segmentation system as support to diagnosis in medical imaging , 1999, Artif. Intell. Medicine.

[4]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[5]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  D. Coomans,et al.  Comparison of Multivariate Discrimination Techniques for Clinical Data— Application to the Thyroid Functional State , 1983, Methods of Information in Medicine.

[7]  Jing Li,et al.  A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[8]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Azriel Rosenfeld,et al.  Pattern Recognition and Computer Vision , 1984, Computer.

[10]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[11]  M. Jeżewski Application of modified fuzzy clustering to medical data classification , 2011 .

[12]  Shengli Wu,et al.  Classifier Combination Using a Class-indifferent Method , 2008, ECAI.

[13]  Rama Krishnaiah,et al.  Data Analysis of Bio-Medical Data Mining using Enhanced Hierarchical Agglomerative Clustering , 2012 .

[14]  Miin-Shen Yang,et al.  A Gaussian kernel-based fuzzy c-means algorithm with a spatial bias correction , 2008, Pattern Recognit. Lett..

[15]  Kathleen Marchal,et al.  Adaptive quality-based clustering of gene expression profiles , 2002, Bioinform..

[16]  Hu Yang,et al.  Biomedical data classification using hierarchical clustering , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[17]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Dervis Karaboga,et al.  Fuzzy clustering with artificial bee colony algorithm , 2010 .

[19]  Mariagrazia Dotoli,et al.  Fuzzy Clustering - A Versatile Mean to Explore Medical Databases. , 2000 .

[20]  Jun-Dong Chang,et al.  Oncogenes and Subtypes of Diffuse Large B-Cell Lymphoma Discoveries from Microarray Database , 2006, JCIS.

[21]  Xiaowen Li,et al.  Performance research of Gaussian function weighted fuzzy C-means algorithm , 2007, International Symposium on Multispectral Image Processing and Pattern Recognition.

[22]  R. J. Alcock Time-Series Similarity Queries Employing a Feature-Based Approach , 1999 .

[23]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[24]  Paulo Novais,et al.  A visual analytics framework for cluster analysis of DNA microarray data , 2013, Expert Syst. Appl..

[25]  Kenneth Revett,et al.  An Analysis of a Lymphoma/Leukaemia Dataset Using Rough Sets and Neural Networks , 2006, ICHIT.

[26]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[27]  D. Vanisri,et al.  An Efficient Fuzzy Possibilistic C-Means with Penalized and Compensated Constraints , 2011 .

[28]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[29]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[30]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..