Multi-way clustering of microarray data using probabilistic sparse matrix factorization

MOTIVATION We address the problem of multi-way clustering of microarray data using a generative model. Our algorithm, probabilistic sparse matrix factorization (PSMF), is a probabilistic extension of a previous hard-decision algorithm for this problem. PSMF allows for varying levels of sensor noise in the data, uncertainty in the hidden prototypes used to explain the data and uncertainty as to the prototypes selected to explain each data vector. RESULTS We present experimental results demonstrating that our method can better recover functionally-relevant clusterings in mRNA expression data than standard clustering techniques, including hierarchical agglomerative clustering, and we show that by computing probabilities instead of point estimates, our method avoids converging to poor solutions.

[1]  B. Frey,et al.  Probabilistic Sparse Matrix Factorization , 2004 .

[2]  Brendan J. Frey,et al.  Finding Novel Transcripts in High-Resolution Genome-Wide Microarray Data Using the GenRate Model , 2005, RECOMB.

[3]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[4]  B. Frey,et al.  The functional landscape of mouse gene expression , 2004, Journal of biology.

[5]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[6]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[7]  Tommi S. Jaakkola,et al.  Sparse Matrix Factorization of Gene Expression Data , 2001 .

[8]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[10]  J FreyBrendan,et al.  Multi-way clustering of microarray data using probabilistic sparse matrix factorization , 2005 .

[11]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[13]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[14]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[15]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.