A New Strategy of Geometrical Biclustering for Microarray Data Analysis

In this paper, we present a new biclustering algorithm to provide the geometrical interpretation of similar microarray gene expression profiles. Different from standard clustering analyses, biclustering methodology can perform simultaneous classification on the row and column dimensions of a data matrix. The main object of the strategy is to reveal the submatrix, in which a subset of genes exhibits a consistent pattern over a subset of conditions. However, the search for such subsets is a computationally complex task. We propose a new algorithm, based on the Hough transform in the column-pair space to perform pattern identification. The algorithm is especially suitable for the biclustering analysis of large-scale microarray data. Our simulation studies show that the method is robust to noise and computationally efficient. Furthermore, we have applied it to a large database of gene expression profiles of multiple human organs and the resulting biclusters show clear biological meanings.

[1]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[2]  J. Khan,et al.  Database of mRNA gene expression profiles of multiple human organs. , 2005, Genome research.

[3]  Peter L. Brooks,et al.  Visualizing data , 1997 .

[4]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[6]  Hong Yan,et al.  Biclustering gene expression data based on a high dimensional geometric method , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[7]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[8]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[9]  William S. Cleveland,et al.  Visualizing Data , 1993 .

[10]  Erkki Oja,et al.  Randomized Hough Transform , 2009, Encyclopedia of Artificial Intelligence.

[11]  R. Stoughton Applications of DNA microarrays in biology. , 2005, Annual review of biochemistry.

[12]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[13]  Erkki Oja,et al.  Randomized hough transform (rht) : Basic mech-anisms, algorithms, and computational complexities , 1993 .

[14]  Josef Kittler,et al.  A survey of the hough transform , 1988, Comput. Vis. Graph. Image Process..

[15]  Bart De Moor,et al.  Biclustering microarray data by Gibbs sampling , 2003, ECCB.

[16]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[18]  Hong Yan,et al.  Cluster analysis of gene expression data based on self-splitting and merging competitive learning , 2004, IEEE Transactions on Information Technology in Biomedicine.

[19]  Dana H. Ballard,et al.  Computer Vision , 1982 .

[21]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.