A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data.

Biclustering is an important tool in microarray analysis when only a subset of genes co-regulates in a subset of conditions. Different from standard clustering analyses, biclustering performs simultaneous classification in both gene and condition directions in a microarray data matrix. However, the biclustering problem is inherently intractable and computationally complex. In this paper, we present a new biclustering algorithm based on the geometrical viewpoint of coherent gene expression profiles. In this method, we perform pattern identification based on the Hough transform in a column-pair space. The algorithm is especially suitable for the biclustering analysis of large-scale microarray data. Our studies show that the approach can discover significant biclusters with respect to the increased noise level and regulatory complexity. Furthermore, we also test the ability of our method to locate biologically verifiable biclusters within an annotated set of genes.

[1]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[2]  William S. Cleveland,et al.  Visualizing Data , 1993 .

[3]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[4]  Amir Hussain,et al.  Biclustering Gene Expression Data in the Presence of Noise , 2005, ICANN.

[5]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[6]  Philip S. Yu,et al.  /spl delta/-clusters: capturing subspace correlation in a large data set , 2002, Proceedings 18th International Conference on Data Engineering.

[7]  Simon Kasif,et al.  GEMS: a web server for biclustering analysis of expression data , 2005, Nucleic Acids Res..

[8]  Dana H. Ballard,et al.  Computer Vision , 1982 .

[9]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[10]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[11]  Hong Yan,et al.  Biclustering gene expression data based on a high dimensional geometric method , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[12]  Yi-Ping Phoebe Chen,et al.  Kernel-based naive bayes classifier for breast cancer prediction , 2007 .

[13]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[14]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[15]  Lusheng Wang,et al.  Computing the maximum similarity bi-clusters of gene expression data , 2007, Bioinform..

[16]  Hong Yan,et al.  HoughFeature, a novel method for assessing drug effects in three-color cDNA microarray experiments , 2007, BMC Bioinformatics.

[17]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[18]  Erkki Oja,et al.  Randomized hough transform (rht) : Basic mech-anisms, algorithms, and computational complexities , 1993 .

[19]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[20]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[21]  Hong Yan,et al.  Cluster analysis of gene expression data based on self-splitting and merging competitive learning , 2004, IEEE Transactions on Information Technology in Biomedicine.

[22]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[23]  A. Schäffer,et al.  Tumor classification using phylogenetic methods on expression data. , 2004, Journal of theoretical biology.

[24]  Srinivas Aluru,et al.  Handbook Of Computational Molecular Biology , 2010 .

[25]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[26]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[27]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[28]  Luca Benini,et al.  Discovering coherent biclusters from gene expression data using zero-suppressed binary decision diagrams , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Assaf Zeevi,et al.  The Hough transform estimator , 2004 .

[30]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[31]  Wojtek J. Krzanowski,et al.  Biclustering models for structured microarray data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Jing Wang,et al.  PQN and DQN: algorithms for expression microarrays. , 2006, Journal of theoretical biology.

[33]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[34]  J. Khan,et al.  Database of mRNA gene expression profiles of multiple human organs. , 2005, Genome research.

[35]  Peter L. Brooks,et al.  Visualizing data , 1997 .

[36]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[37]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[38]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[39]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[40]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[41]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[42]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Wojtek J. Krzanowski,et al.  Biclustering Models for Structured Microarray Data , 2005, TCBB.

[44]  Aidong Zhang,et al.  Interrelated two-way clustering: an unsupervised approach for gene expression data analysis , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[45]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[46]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[47]  Hong Yan,et al.  A New Strategy of Geometrical Biclustering for Microarray Data Analysis , 2007, APBC.

[48]  Josef Kittler,et al.  A survey of the hough transform , 1988, Comput. Vis. Graph. Image Process..

[49]  R. Stoughton Applications of DNA microarrays in biology. , 2005, Annual review of biochemistry.