Rough Based Symmetrical Clustering for Gene Expression Profile Analysis

Identification of coexpressed genes is the central goal in microarray gene expression data analysis. Point symmetry-based clustering is an important unsupervised learning technique for recognizing symmetrical convex or non-convex shaped clusters. To enable fast automatic clustering of large microarray data, in this article, a distributed time-efficient scalable parallel rough set based hybrid approach for point symmetry-based clustering algorithm has been proposed. A natural basis for analyzing gene expression data using the symmetry-based algorithm, is to group together genes with similar symmetrical patterns of expression. Rough-set theory helps in faster convergence and initial automatic optimal classification, thereby solving the problem of unknown knowledge of number of clusters in microarray data. This new parallel implementation with K-means algorithm also satisfies the linear speedup in timing on large microarray datasets. This proposed algorithm is compared with another parallel symmetry-based K-means and parallel version of existing K-means over four artificial and benchmark microarray datasets. We also have experimented over three skewed cancer gene expression datasets. The statistical analysis are also performed to establish the significance of this new implementation. The biological relevance of the clustering solutions are also analyzed.

[1]  Jun Ni,et al.  Mining and Integrating Reliable Decision Rules for Imbalanced Cancer Gene Expression Data Sets , 2012 .

[2]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Michael R. Gawrys,et al.  RSL-The Rough Set Library , 1994 .

[4]  Chen Hao,et al.  Research of Cloud Computing Based on the Hadoop Platform , 2011, 2011 International Conference on Computational and Information Sciences.

[5]  Shusaku Tsumoto,et al.  An Indiscernibility-Based Clustering Method with Iterative Refinement of Equivalence Relations -Rough Clustering- , 2003, Journal of Advanced Computational Intelligence and Intelligent Informatics.

[6]  Doheon Lee,et al.  Detecting clusters of different geometrical shapes in microarray gene expression data , 2005, Bioinform..

[7]  Peter S. Pacheco Parallel programming with MPI , 1996 .

[8]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[9]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[10]  Ron Shamir,et al.  CLICK and EXPANDER: a system for clustering and visualizing gene expression data , 2003, Bioinform..

[11]  Satoru Miyano,et al.  Open source clustering software , 2004 .

[12]  M. Tyers,et al.  Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. , 2002, Cancer research.

[13]  Chien-Hsing Chou,et al.  Short Papers , 2001 .

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[15]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[16]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[17]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[18]  Arnold L. Rosenberg,et al.  Bounded-Collision Memory-Mapping Schemes for Data Structures with Applications to Parallel Memories , 2007, IEEE Transactions on Parallel and Distributed Systems.

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Pascal Nsoh,et al.  Large-scale temporal gene expression mapping of central nervous system development , 2007 .

[22]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Sanghamitra Bandyopadhyay,et al.  GAPS: A clustering method using a new point symmetry-based distance measure , 2007, Pattern Recognit..

[24]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..