论文信息 - Finding biclusters by random projections

Finding biclusters by random projections

Given a matrix X composed of symbols, a bicluster is a submatrix of X obtained by removing some of the rows and some of the columns of X in such a way that each row of what is left reads the same string. In this paper, we are concerned with the problem of finding the bicluster with the largest area in a large matrix X. The problem is first proved to be NP-complete. We present a fast and efficient randomized algorithm that discovers the largest bicluster by random projections. A detailed probabilistic analysis of the algorithm and an asymptotic study of the statistical significance of the solutions are given. We report results of extensive simulations on synthetic data.

Wojciech Szpankowski | Stefano Lonardi | Qiaofeng Yang

[1] Avraham A. Melkman,et al. Sleeved coclustering , 2004, KDD '04.

[2] Dorit S. Hochbaum,et al. Approximating Clique and Biclique Problems , 1998, J. Algorithms.

[3] L. Lazzeroni. Plaid models for gene expression data , 2000 .

[4] Wojciech Szpankowski,et al. Average Case Analysis of Algorithms on Sequences: Szpankowski/Average , 2001 .

[5] T. M. Murali,et al. A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[6] W. Szpankowski. Average Case Analysis of Algorithms on Sequences , 2001 .

[7] Song Zhu,et al. A new clustering method for microarray data analysis , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[8] Milind Dawande,et al. On Bipartite and Multipartite Clique Problems , 2001, J. Algorithms.

[9] Philip S. Yu,et al. Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[10] Ravi Kumar,et al. Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[11] Ash A. Alizadeh,et al. 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[12] Bart De Moor,et al. Biclustering microarray data by Gibbs sampling , 2003, ECCB.

[13] Philip S. Yu,et al. Fast algorithms for projected clustering , 1999, SIGMOD '99.

[14] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15] George M. Church,et al. Biclustering of Expression Data , 2000, ISMB.

[16] J. Hartigan. Direct Clustering of a Data Matrix , 1972 .

[17] Daniel Hanisch,et al. Co-clustering of biological networks and gene expression data , 2002, ISMB.

[18] René Peeters,et al. The maximum edge biclique problem is NP-complete , 2003, Discret. Appl. Math..

[19] Joseph T. Chang,et al. Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[20] Jinze Liu,et al. Biclustering in gene expression data by tendency , 2004 .

[21] T. M. Murali,et al. Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.