Random walk biclustering for microarray data

A biclustering algorithm, based on a greedy technique and enriched with a local search strategy to escape poor local minima, is proposed. The algorithm starts with an initial random solution and searches for a locally optimal solution by successive transformations that improve a gain function. The gain function combines the mean squared residue, the row variance, and the size of the bicluster. Different strategies to escape local minima are introduced and compared. Experimental results on several microarray data sets show that the method is able to find significant biclusters, also from a biological point of view.

[1]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[2]  Panos M. Pardalos,et al.  Feature Selection for Consistent Biclustering via Fractional 0–1 Programming , 2005, J. Comb. Optim..

[3]  Philip S. Yu,et al.  /spl delta/-clusters: capturing subspace correlation in a large data set , 2002, Proceedings 18th International Conference on Data Engineering.

[4]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[5]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Inderjit S. Dhillon,et al.  Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data , 2004, SDM.

[7]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[8]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[10]  Bart Selman,et al.  Noise Strategies for Improving Local Search , 1994, AAAI.

[11]  Francisco Azuaje,et al.  A cluster validity framework for genome expression data , 2002, Bioinform..

[12]  T. M. Murali,et al.  Automatic layout and visualization of biclusters , 2006, Algorithms for Molecular Biology.

[13]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[14]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[15]  Jennifer Y. King,et al.  Pathway analysis of coronary atherosclerosis. , 2005, Physiological genomics.

[16]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[17]  Werner Dubitzky,et al.  A Practical Approach to Microarray Data Analysis , 2003, Springer US.

[18]  Francisco Azuaje,et al.  Clustering Genomic Expression Data: Design and Evaluation Principles , 2003 .

[19]  Michalis Vazirgiannis,et al.  Clustering validity checking methods: part II , 2002, SGMD.

[20]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[21]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[22]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[23]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[24]  David J. Reiss,et al.  Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks , 2006, BMC Bioinformatics.

[25]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[26]  Roded Sharan,et al.  Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[28]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[30]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[31]  HalkidiMaria,et al.  Cluster validity methods , 2002 .

[32]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[33]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[34]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[35]  Panos M. Pardalos,et al.  Biclustering in data mining , 2008, Comput. Oper. Res..

[36]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.