Mib: Using Mutual Information for Biclustering High Dimensional Data

Most of the biclustering algorithms for gene expression data are based either on the Euclidean distance or correlation coefficient which capture only linear relationships. However, in gene expression data, non linear relationships may exist between the genes. Mutual Information between two variables provides a more general criterion to investigate dependencies amongst variables. In this paper, we propose an algorithm that uses mutual information for biclustering gene expression data. We present the experimental results on synthetic data. None of the distance based biclustering algorithms will identify the biclusters in our synthetic data which our algorithm is able to report. In future we intend to use our algorithm on gene expression data.

[1]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[2]  Ned S. Wingreen,et al.  Finding regulatory modules through large-scale gene-expression data analysis , 2003, Bioinform..

[3]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[4]  G S Michaels,et al.  Cluster analysis and data visualization of large-scale gene expression data. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[5]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD 2000.

[6]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[7]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[8]  Lusheng Wang,et al.  Computing the maximum similarity bi-clusters of gene expression data , 2007, Bioinform..

[9]  Krista Rizman Zalik,et al.  Biclustering of gene expression data , 2005 .

[10]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[11]  Jian Pei,et al.  Clustering by Pattern Similarity , 2008, Journal of Computer Science and Technology.

[12]  Michael K. Ng,et al.  HARP: a practical projected clustering algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Oded Maimon,et al.  Evaluation of gene-expression clustering via mutual information distance measure , 2007, BMC Bioinformatics.

[14]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Xiaobo Zhou,et al.  Gene Clustering Based on Clusterwide Mutual Information , 2004, J. Comput. Biol..