Models of Random Sparse Eigenmatrices with Application to Bayesian Factor Analysis

We discuss a new class of models for random covariance structures defined by probability distributions over sparse eigenmatrices. The decomposition of orthogonal square matrices in terms of Givens rotations defines a natural, interpretable framework for defining prior distributions over the sparsity structure of random eigenmatrices. We explore some theoretical aspects and implications for conditional independence graphical features arising in Gaussian models, and develop classes of priors over these matrices to underlie Bayesian modeling of resulting variance matrices. We discuss exploratory data analysis in this context, and define and implement Bayesian analysis based on reversible jump Markov chain Monte Carlo. This analysis is extended to the context of multivariate normal mixture models using the novel sparsity priors for variance matrices of the normal components, and this is applied in a study of a 20−dimensional gene expression data set in breast cancer classification.

[1]  D. R. Fulkerson,et al.  Incidence matrices and interval graphs , 1965 .

[2]  T. W. Anderson,et al.  Generation of random orthogonal matrices , 1987 .

[3]  M. West,et al.  A Bayesian method for classification and discrimination , 1992 .

[4]  Nicholas I. Fisher,et al.  Statistical Analysis of Circular Data , 1993 .

[5]  J. Berger,et al.  Estimation of a Covariance Matrix Using the Reference Prior , 1994 .

[6]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[7]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[8]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Geoffrey J. McLachlan,et al.  Modelling high-dimensional data by mixtures of factor analyzers , 2003, Comput. Stat. Data Anal..

[10]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[11]  Michael A. West,et al.  Archival Version including Appendicies : Experiments in Stochastic Computation for High-Dimensional Graphical Models , 2005 .

[12]  Therese Sørlie,et al.  Molecular portraits of breast cancer: tumour subtypes as distinct disease entities. , 2004, European journal of cancer.

[13]  M. West,et al.  Gene Expression Phenotypes of Atherosclerosis , 2004, Arteriosclerosis, thrombosis, and vascular biology.

[14]  Michael A. West,et al.  Covariance decomposition in undirected Gaussian graphical models , 2005 .

[15]  Carlos M. Carvalho,et al.  Sparse Statistical Modelling in Gene Expression Genomics , 2006 .

[16]  Pascal J. Goldschmidt-Clermont,et al.  Of mice and men: Sparse statistical modeling in cardiovascular genomics , 2007, 0709.0165.

[17]  Peter D. Hoff,et al.  Simulation of the Matrix Bingham–von Mises–Fisher Distribution, With Applications to Multivariate and Relational Data , 2007, 0712.4166.

[18]  M. West,et al.  High-dimensional Regression in Cancer Genomics , 2007 .

[19]  Cliburn Chan,et al.  Statistical mixture modeling for cell subtype identification in flow cytometry , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[20]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[21]  M. West,et al.  Cross-Study Projections of Genomic Biomarkers: An Evaluation in Cancer Genomics , 2009, PloS one.

[22]  Mohsen Pourahmadi,et al.  Modeling covariance matrices via partial autocorrelations , 2009, J. Multivar. Anal..

[23]  M. West,et al.  A Bayesian Analysis Strategy for Cross-Study Translation of Gene Expression Biomarkers , 2009, Statistical applications in genetics and molecular biology.

[24]  Ryo Yoshida,et al.  Bayesian Learning in Sparse Graphical Factor Models via Annealed Entropy , 2010 .

[25]  Cliburn Chan,et al.  Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[26]  In-Vitro to In-Vivo Factor Profiling in Expression Genomics , 2010 .

[27]  Mike West,et al.  Efficient Classification-Based Relabeling in Mixture Models , 2011, The American statistician.

[28]  Abel Rodriguez,et al.  Bayesian Inference for General Gaussian Graphical Models With Application to Multivariate Lattice Data , 2010, Journal of the American Statistical Association.

[29]  Christopher Holmes,et al.  Some of the What?, Why?, How?, Who? and Where? of Graphics Processing Unit Computing for Bayesian Analysis , 2011 .

[30]  Abel Rodríguez,et al.  Sparse covariance estimation in heterogeneous samples. , 2010, Electronic journal of statistics.