Cancer Gene Expression Data Analysis Using Rough Based Symmetrical Clustering

Identification of cancer subtypes is the central goal in the cancer gene expression data analysis. Modified symmetry-based clustering is an unsupervised learning technique for detecting symmetrical convex or non-convex shaped clusters. To enable fast automatic clustering of cancer tissues (samples), in this chapter, the authors propose a rough set based hybrid approach for modified symmetry-based clustering algorithm. A natural basis for analyzing gene expression data using the symmetry-based algorithm is to group together genes with similar symmetrical patterns of microarray expressions. Rough-set theory helps in faster convergence and initial automatic optimal classification, thereby solving the problem of unknown knowledge of number of clusters in gene expression measurement data. For rough-settheoretic decision rule generation, each cluster is classified using heuristically searched optimal reducts to overcome overlapping cluster problem. The rough modified symmetry-based clustering algorithm is compared with another newly implemented rough-improved symmetry-based clustering algorithm and existing K-Means algorithm over five benchmark cancer gene expression data sets, to demonstrate its superiority in terms of validity. The statistical analyses are also performed to establish the significance of this rough modified symmetry-based clustering approach.

[1]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[2]  S. Bandyopadhyay,et al.  Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes , 2009, BMC Bioinformatics.

[3]  Isak Gath,et al.  Detection and Separation of Ring-Shaped Clusters Using Fuzzy Clustering , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Ying Xu,et al.  Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees , 2002, Bioinform..

[5]  Robert Clarke,et al.  Dynamic modelling of oestrogen signalling and cell fate in breast cancer cells , 2011, Nature Reviews Cancer.

[6]  Satoru Miyano,et al.  Open source clustering software , 2004 .

[7]  Rajesh N. Dave,et al.  Use Of The Adaptive Fuzzy Clustering Algorithm To Detect Lines In Digital Images , 1990, Other Conferences.

[8]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[10]  Arnold L. Rosenberg,et al.  Bounded-Collision Memory-Mapping Schemes for Data Structures with Applications to Parallel Memories , 2007, IEEE Transactions on Parallel and Distributed Systems.

[11]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[12]  Rainer Spang,et al.  Diagnostic signatures from microarrays: a bioinformatics concept for personalized medicine. , 2003, Drug discovery today.

[13]  Martin Schäfer,et al.  Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review , 2011, Briefings Bioinform..

[14]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[15]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Anirban Mukherjee,et al.  Cancer Classification from Gene Expression Data by NPPC Ensemble , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  William Stafford Noble,et al.  Kernel hierarchical gene clustering from microarray expression data , 2003, Bioinform..

[19]  L. V. van't Veer,et al.  Gene expression signature to improve prognosis prediction of stage II and III colorectal cancer. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[20]  Ujjwal Maulik,et al.  Development of the human cancer microRNA network , 2010 .

[21]  Ricardo Vilalta,et al.  Introduction to the Special Issue on Meta-Learning , 2004, Machine Learning.

[22]  Robert Clarke,et al.  Motif-guided sparse decomposition of gene expression data for regulatory module identification , 2011, BMC Bioinformatics.

[23]  Jonathan M. Garibaldi,et al.  ArrayMining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization , 2009, BMC Bioinformatics.

[24]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[25]  Sanghamitra Bandyopadhyay,et al.  Analysis of Biological Data: A Soft Computing Approach , 2007, Science, Engineering, and Biology Informatics.

[26]  Alexander Schliep,et al.  Ranking and selecting clustering algorithms using a meta-learning approach , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[27]  Rainer Fuchs,et al.  Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters , 2001, Bioinform..

[28]  Chien-Hsing Chou,et al.  Short Papers , 2001 .

[29]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[30]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[31]  S. Bandyopadhyay,et al.  Nonparametric genetic clustering: comparison of validity indices , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[32]  Jorge S. Reis-Filho,et al.  Microarray-Based Class Discovery for Molecular Classification of Breast Cancer: Analysis of Interobserver Agreement , 2011, Journal of the National Cancer Institute.

[33]  Subha Madhavan,et al.  PUGSVM: a caBIGTM analytical tool for multiclass gene selection and predictive classification , 2011, Bioinform..

[34]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Siddhartha Bhattacharyya,et al.  Efficient Color Image Segmentation by a Parallel Optimized (ParaOptiMUSIG) Activation Function , 2014 .

[36]  Yunhao Liu,et al.  Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs , 2006, IEEE Trans. Parallel Distributed Syst..