论文信息 - Clustering Gene Expression Data using Quad Tree based Expectation Maximization Approach

Clustering Gene Expression Data using Quad Tree based Expectation Maximization Approach

In molecular biology, micro arrays are employed in monitoring the expression levels of genes simultaneously. Arrays are used in the domains of gene expression, genome mapping, toxicity, pathogen identification and other biological applications. Clustering is a useful technique for grouping gene expression data. In clustering, similar gene expression data will be grouped together for identifying relationships between the genes. Clustering of gene expression data is a useful tool for identifying co-expressed genes and biologically relevant grouping of genes, which is an important research area in Bioinformatics. In this paper, a Quad Tree based Expectation Maximization (EM) algorithm has been applied for clustering gene expression data. Quad Tree is used to initialize the cluster centroids. With these centroids, EM is used to group the data efficiently. Expectation Maximization is used to compute maximum likelihood estimates given incomplete samples. Silhouette refers to a method of interpretation and validation of clusters. This measure provides a representation of how well each object lies within its cluster.Experimental results have shown that Quad Tree based Expectation Maximization algorithm finds compact clusters when compared to K-Means algorithm.

Leela Rani.P | Rajalakshmi.P Rajalakshmi.P

[1] Mark Schena,et al. Microarray Biochip Technology , 2000 .

[2] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[4] K. Thangavel,et al. Performance Analysis of Enhanced Clustering Algorithm for Gene Expression Data , 2011, ArXiv.

[5] M. Punithavalli,et al. An Analytical Study on Behavior of Clusters Using K Means, EM and K* Means Algorithm , 2010, ArXiv.

[6] Jian Pei,et al. Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[7] D. Botstein,et al. The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[8] Sung-Hyon Myaeng,et al. Initializing K-Means using Genetic Algorithms , 2009 .

[9] Moh'd Belal Al Zoubi,et al. An Efficient Approach for Computing Silhouette Coefficients , 2008 .

[10] Osama Abu Abbas,et al. Comparisons Between Data Clustering Algorithms , 2008, Int. Arab J. Inf. Technol..