Markov chain correlation based clustering of gene expression data

An efficient Markov chain correlation based clustering method (MCC) has been proposed for clustering gene expression data. The gene expression data is first normalized and Markov chains (MC) are constructed from the dynamics of the gene expressions, in which the behavior of the genes at each step of the experiment can be taken into account. Based on the correlation of one-step Markov chain transition probabilities, an agglomerative method is employed to group the series that have similar behavior at each point. The proposed MCC clustering method has been applied to four gene expression datasets to obtain a number of clusters. The results show that the MCC method outperforms the commonly used K-means method and produces clusters that are more meaningful in terms of the similarity of the grouped genes. Another advantage of the proposed method over the existing clustering methods is that the knowledge of the group number is not required.