Factorgrams: A tool for visualizing multi-way associations in biological data

Effective visualization of biological data is often critical for subsequent analysis. The popular clustergram/dendrogram visualization rearranges rows and columns of a data matrix so as to highlight clusters of similar responses, but assumes each row or column belongs to only one cluster and cannot associate each row or column with multiple clusters. Such multi-way associations occur frequently, e.g., when a gene plays multiple biological roles. We describe the ’factorgram’ visualization, which rearranges the data into an expanded view, associating each row (or column) with multiple clusters of rows (or columns) and elucidating potentially new biological relationships. Factorgrams for mouse gene expression and yeast synthetic-lethal gene-interaction datasets detect a larger number of statistically-significant clusters than clustergrams, plus a larger number of clusters enriched for gene ontology annotations. Experimentally-verified associations previously identified by manual rearrangement of rows and columns not grouped together by clustergrams, are readily identified by the factorgram.

[1]  Inmar E. Givoni,et al.  Exploring the Mode-of-Action of Bioactive Compounds by Chemical-Genetic Profiling in Yeast , 2006, Cell.

[2]  Brendan J. Frey,et al.  Matrix Tile Analysis , 2006, UAI.

[3]  Michael I. Jordan,et al.  A latent variable model for chemogenomic profiling , 2005, Bioinform..

[4]  Brendan J. Frey,et al.  Multi-way clustering of microarray data using probabilistic sparse matrix factorization , 2005, ISMB.

[5]  B. Frey,et al.  The functional landscape of mouse gene expression , 2004, Journal of biology.

[6]  Iterative analysis of microarray data , 2004 .

[7]  Gary D Bader,et al.  Systematic Genetic Analysis with Ordered Arrays of Yeast Deletion Mutants , 2001, Science.

[8]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[10]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[11]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[12]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[14]  I. Jolliffe Principal Component Analysis , 2005 .

[15]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[16]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[17]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .