Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity

ABSTRACT Rotational post hoc transformations have traditionally played a key role in enhancing the interpretability of factor analysis. Regularization methods also serve to achieve this goal by prioritizing sparse loading matrices. In this work, we bridge these two paradigms with a unifying Bayesian framework. Our approach deploys intermediate factor rotations throughout the learning process, greatly enhancing the effectiveness of sparsity inducing priors. These automatic rotations to sparsity are embedded within a PXL-EM algorithm, a Bayesian variant of parameter-expanded EM for posterior mode detection. By iterating between soft-thresholding of small factor loadings and transformations of the factor basis, we obtain (a) dramatic accelerations, (b) robustness against poor initializations, and (c) better oriented sparse solutions. To avoid the prespecification of the factor cardinality, we extend the loading matrix to have infinitely many columns with the Indian buffet process (IBP) prior. The factor dimensionality is learned from the posterior, which is shown to concentrate on sparse matrices. Our deployment of PXL-EM performs a dynamic posterior exploration, outputting a solution path indexed by a sequence of spike-and-slab priors. For accurate recovery of the factor loadings, we deploy the spike-and-slab LASSO prior, a two-component refinement of the Laplace prior. A companion criterion, motivated as an integral lower bound, is provided to effectively select the best recovery. The potential of the proposed procedure is demonstrated on both simulated and real high-dimensional data, which would render posterior simulation impractical. Supplementary materials for this article are available online.

[1]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[2]  H. Kaiser The varimax criterion for analytic rotation in factor analysis , 1958 .

[3]  G. Kaufman,et al.  Bayesian factor analysis , 1973 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[6]  P. Green On Use of the EM Algorithm for Penalized Likelihood Estimation , 1990 .

[7]  Xiao-Li Meng,et al.  Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm , 1991 .

[8]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[9]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[10]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[11]  J. Geweke,et al.  Measuring the pricing error of the arbitrage pricing theory , 1996 .

[12]  D. Rubin,et al.  Parameter expansion to accelerate EM : The PX-EM algorithm , 1997 .

[13]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[14]  Xiao-Li Meng,et al.  Seeking efficient data augmentation schemes via conditional and marginal augmentation , 1999 .

[15]  Jun S. Liu,et al.  Parameter Expansion for Data Augmentation , 1999 .

[16]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[17]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[18]  Thomas P. Minka,et al.  Using lower bounds to approxi-mate integrals , 2001 .

[19]  David A. Van Dyk,et al.  The one-step-late PXEM algorithm , 2003, Stat. Comput..

[20]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[21]  Michael A. West,et al.  BAYESIAN MODEL ASSESSMENT IN FACTOR ANALYSIS , 2004 .

[22]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[23]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[24]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[25]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[26]  Yuan Qi,et al.  Parameter Expanded Variational Bayesian Methods , 2006, NIPS.

[27]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[28]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[29]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[30]  Y. Teh,et al.  Stick-breaking construction for the Indian buffet , 2007 .

[31]  A. Owen,et al.  AGEMAP: A Gene Expression Database for Aging in Mice , 2007, PLoS genetics.

[32]  Nicolai Meinshausen,et al.  Relaxed Lasso , 2007, Comput. Stat. Data Anal..

[33]  Hal Daumé,et al.  The Infinite Hierarchical Factor Regression Model , 2008, NIPS.

[34]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[35]  Yee Whye Teh,et al.  Variational Inference for the Indian Buffet Process , 2009, AISTATS.

[36]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[37]  Guillermo Sapiro,et al.  Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations , 2009, NIPS.

[38]  David B Dunson,et al.  Default Prior Distributions and Efficient Posterior Computation in Bayesian Factor Analysis , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[39]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[40]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[41]  Chuanhai Liu,et al.  Parameter Expansion and Efficient Inference , 2010, 1104.2407.

[42]  Mike West,et al.  Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing , 2010, J. Mach. Learn. Res..

[43]  Alexander Ilin,et al.  Transformations in variational Bayesian factor analysis to speed up learning , 2010, Neurocomputing.

[44]  Hedibert Freitas Lopes,et al.  Parsimonious Bayesian Factor Analysis when the Number of Factors is Unknown , 2010 .

[45]  Xiao-Li Meng,et al.  Cross-fertilizing strategies for better EM mountain climbing and DA field exploration: A graphical guide book , 2010, 1104.1897.

[46]  Zoubin Ghahramani,et al.  Nonparametric Bayesian Sparse Factor Models with application to Gene Expression modelling , 2010, The Annals of Applied Statistics.

[47]  Patrick O. Perry,et al.  A Rotation Test to Verify Latent Structure , 2010, J. Mach. Learn. Res..

[48]  Lawrence Carin,et al.  Variational Inference for Stick-Breaking Beta Process Priors , 2011, ICML.

[49]  Thomas L. Griffiths,et al.  The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..

[50]  D. Dunson,et al.  Sparse Bayesian infinite factor models. , 2011, Biometrika.

[51]  Hemant Ishwaran,et al.  Consistency of spike and slab regression , 2011 .

[52]  A. V. D. Vaart,et al.  Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences , 2012, 1211.1197.

[53]  Michael I. Jordan,et al.  Stick-Breaking Beta Processes and the Poisson Process , 2012, AISTATS.

[54]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[55]  Zoubin Ghahramani,et al.  A Non-parametric Conditional Factor Regression Model for Multi-Dimensional Input and Response , 2014, AISTATS.

[56]  Debdeep Pati,et al.  Posterior contraction in sparse Bayesian factor models for massive covariance matrices , 2012, 1206.3627.

[57]  Veronika Rockova,et al.  EMVS: The EM Approach to Bayesian Variable Selection , 2014 .

[58]  Arto Klami,et al.  Polya-gamma augmentations for factor models , 2014, ACML.

[59]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[60]  N. Pillai,et al.  Dirichlet–Laplace Priors for Optimal Shrinkage , 2014, Journal of the American Statistical Association.

[61]  V. Rocková,et al.  Bayesian estimation of sparse signals with a continuous spike-and-slab prior , 2018 .

[62]  E. George,et al.  The Spike-and-Slab LASSO , 2018 .