Mining Positive and Negative Co-regulation Patterns from Microarray Data

Currently, pattern-based and tendency-based models are very popular for clustering co-regulated genes. In this paper, we propose another novel model, namely g-Cluster. The proposed model has the following advantages: (1) find positive and negative co-regulated genes in a shot, (2) get away from the restriction of magnitude transformation relationship among genes, and (3) guarantee quality of clusters and significance of regulations using a novel similarity measurement gCode and two user-specified thresholds, called wave constraint threshold and regulation threshold respectively. We also design a novel tree-based clustering algorithm, FBTD, combined with efficient pruning rules to identify all maximal g-Clusters. The extensive experiments on real and synthetic datasets show that (1) our algorithm can effectively and efficiently find an amount of co-regulated gene clusters missed by previous models, which are potentially of high biological significance, and (2) our algorithm is superior to the existing approaches

[1]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[2]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[3]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[4]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[5]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[6]  Ya Zhang,et al.  A time-series biclustering algorithm for revealing co-regulated genes , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[7]  Wei Wang,et al.  OP-cluster: clustering by tendency in high dimensional space , 2003, Third IEEE International Conference on Data Mining.

[8]  M. Gerstein,et al.  Genomic analysis of gene expression relationships in transcriptional regulatory networks. , 2003, Trends in genetics : TIG.

[9]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[10]  Jian Pei,et al.  Mining coherent gene clusters from gene-sample-time microarray data , 2004, KDD.

[11]  Aidong Zhang,et al.  Interrelated two-way clustering: an unsupervised approach for gene expression data analysis , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[12]  Jinze Liu,et al.  Biclustering in gene expression data by tendency , 2004 .

[13]  Michael K. Ng,et al.  On Mining Micro-array data by Order-Preserving Submatrix , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[14]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[15]  Ozgur Ozturk,et al.  A time series analysis of microarray data , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.