gTRICLUSTER: A More General and Effective 3D Clustering Algorithm for Gene-Sample-Time Microarray Data

Clustering is an important technique in microarray data analysis, and mining three-dimensional (3D) clusters in gene-sample-time (simply GST) microarray data is emerging as a hot research topic in this area. A 3D cluster consists of a subset of genes that are coherent on a subset of samples along a segment of time series. This kind of coherent clusters may contain information for the users to identify useful phenotypes, potential genes related to these phenotypes and their expression rules. TRICLUSTER is the state-of-the-art 3D clustering algorithm for GST microarray data. In this paper, we propose a new algorithm to mine 3D clusters over GST microarray data. We term the new algorithm gTRICLUSTER because it is based on a more general 3D cluster model than the one that TRICLUSTER is based on. gTRICLUSTER can find more biologically meaningful coherent gene clusters than TRICLUSTER can do. It also outperforms TRICLUSTER in robustness to noise. Experimental results on a real-world microarray dataset validate the effectiveness of the proposed new algorithm.