Clustering Gene Expression Data using Quad Tree based Expectation Maximization Approach

In molecular biology, micro arrays are employed in monitoring the expression levels of genes simultaneously. Arrays are used in the domains of gene expression, genome mapping, toxicity, pathogen identification and other biological applications. Clustering is a useful technique for grouping gene expression data. In clustering, similar gene expression data will be grouped together for identifying relationships between the genes. Clustering of gene expression data is a useful tool for identifying co-expressed genes and biologically relevant grouping of genes, which is an important research area in Bioinformatics. In this paper, a Quad Tree based Expectation Maximization (EM) algorithm has been applied for clustering gene expression data. Quad Tree is used to initialize the cluster centroids. With these centroids, EM is used to group the data efficiently. Expectation Maximization is used to compute maximum likelihood estimates given incomplete samples. Silhouette refers to a method of interpretation and validation of clusters. This measure provides a representation of how well each object lies within its cluster.Experimental results have shown that Quad Tree based Expectation Maximization algorithm finds compact clusters when compared to K-Means algorithm.