Optimal Arrangement of Leaves in the Tree Representing Hierarchical Clustering of Gene Expression Data

In this paper, we study how to present gene expression data to display similarities by trying to find a linear ordering of genes such that genes with similar expression profiles will be close in this ordering. In general, finding the best possible order is intractable. Therefore we assume that hierarchical clustering has been applied to the gene expression profiles and show that the best order respecting the clustering can be computed efficiently. We perform experiments comparing the optimal order to several other methods. The implementation of the algorithm, as well as a simple program for viewing hierarchically clustered expression array data and the complete results of our experiments are available at http://monod.uwaterloo.ca/supplements/01expr/.

[1]  L. R. Kerr The Effect of Algebraic Structure on the Computational Complexity of Matrix Multiplication , 1970 .

[2]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[3]  G. Sherlock,et al.  A whole-genome microarray reveals genetic diversity among Helicobacter pylori strains. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[4]  V. Strassen Gaussian elimination is not optimal , 1969 .

[5]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[6]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[7]  V. Reinke,et al.  A global profile of germline gene expression in C. elegans. , 2000, Molecular cell.

[8]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[9]  Victor Y. Pan,et al.  How to Multiply Matrices Faster , 1984, Lecture Notes in Computer Science.

[10]  Tao Jiang,et al.  Algorithmic Approaches to Clustering Gene Expression Data , 2002 .

[11]  A. J.,et al.  Analysis of Christofides ' heuristic : Some paths are more difficult than cycles , 2002 .

[12]  Hong Wang,et al.  Gene Expression Profiles during the Initial Phase of Salt Stress in Rice , 2001, Plant Cell.

[13]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[14]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[16]  N R Cozzarelli,et al.  Analysis of topoisomerase function in bacterial replication fork movement: use of DNA microarrays. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[18]  Victor Y. Pan,et al.  Fast Rectangular Matrix Multiplication and Applications , 1998, J. Complex..

[19]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[20]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..

[21]  D. Botstein,et al.  DNA microarray analysis of gene expression in response to physiological and genetic changes that affect tryptophan metabolism in Escherichia coli. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[23]  Robert J. Schaffer,et al.  Microarray Analysis of Diurnal and Circadian-Regulated Genes in Arabidopsis , 2001, The Plant Cell.