Bayesian group latent factor analysis with structured sparsity

Latent factor models are the canonical statistical tool for exploratory analyses of low-dimensional linear structure for an observation matrix with p features across n samples. We develop a structured Bayesian group factor analysis model that extends the factor model to multiple coupled observation matrices; in the case of two observations, this reduces to a Bayesian model of canonical correlation analysis. The main contribution of this work is to carefully define a structured Bayesian prior that encourages both element-wise and column-wise shrinkage and leads to desirable behavior on high-dimensional data. In particular, our model puts a structured prior on the joint factor loading matrix, regularizing at three levels, which enables element-wise sparsity and unsupervised recovery of latent factors corresponding to structured variance across arbitrary subsets of the observations. In addition, our structured prior allows for both dense and sparse latent factors so that covariation among either all features or only a subset of features can both be recovered. We use fast parameter-expanded expectation-maximization for parameter estimation in this model. We validate our method on both simulated data with substantial structure and real data, comparing against a number of state-of-the-art approaches. These results illustrate useful properties of our model, including i) recovering sparse signal in the presence of dense effects; ii) the ability to scale naturally to large numbers of observations; iii) flexible observation- and factor-specific regularization to recover factors with a wide variety of sparsity levels and percentage of variance explained; and iv) tractable inference that scales to modern genomic and document data sizes.

[1]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[2]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[3]  M. Stephens,et al.  Bayesian statistical methods for genetic association studies , 2009, Nature Reviews Genetics.

[4]  M. Stephens A Unified Framework for Association Analysis with Multiple Related Phenotypes , 2013, PloS one.

[5]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[6]  E. George,et al.  Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity , 2016 .

[7]  D. Edwards Introduction to graphical modelling , 1995 .

[8]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[9]  James G. Scott,et al.  Handling Sparsity via the Horseshoe , 2009, AISTATS.

[10]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[11]  Samuel Kaski,et al.  Bayesian Group Factor Analysis , 2012, AISTATS.

[12]  Chris Hans Bayesian lasso regression , 2009 .

[13]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[14]  Christopher D. Brown,et al.  A latent factor model with a mixture of sparse and dense factors to model gene expression data with confounding effects , 2013, 1310.4792.

[15]  M. Stephens,et al.  Interpreting principal component analyses of spatial population genetic variation , 2008, Nature Genetics.

[16]  Xinlei Chen,et al.  Sparse structured probabilistic projections for factorized latent spaces , 2011, CIKM '11.

[17]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[18]  Christopher D. Brown,et al.  A statin-dependent QTL for GATM expression is associated with statin-induced myopathy , 2013, Nature.

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[21]  Ian C. McDowell,et al.  Differential gene co-expression networks via Bayesian biclustering models , 2014, 1411.1997.

[22]  B. Torrésani,et al.  Structured Sparsity: from Mixed Norms to Structured Shrinkage , 2009 .

[23]  Samuel Kaski,et al.  Bayesian Canonical correlation analysis , 2013, J. Mach. Learn. Res..

[24]  Joseph E. Lucas,et al.  Latent Factor Analysis to Discover Pathway-Associated Putative Segmental Aneuploidies in Human Cancers , 2010, PLoS Comput. Biol..

[25]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[26]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[27]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[28]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[29]  Neil D. Lawrence,et al.  Ambiguity Modeling in Latent Spaces , 2008, MLMI.

[30]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  Junzhou Huang,et al.  Learning with structured sparsity , 2009, ICML '09.

[33]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[34]  Wei Zhang,et al.  A Bayesian Partition Method for Detecting Pleiotropic and Epistatic eQTL Modules , 2010, PLoS Comput. Biol..

[35]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[36]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[37]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[38]  M. West On scale mixtures of normal distributions , 1987 .

[39]  Ker-Chau Li,et al.  Genome-wide coexpression dynamics: Theory and application , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Samuel Kaski,et al.  Probabilistic approach to detecting dependencies between data sets , 2008, Neurocomputing.

[41]  T. Griffiths,et al.  Bayesian nonparametric latent feature models , 2007 .

[42]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[43]  M. Browne Factor analysis of multiple batteries by maximum likelihood , 1980 .

[44]  Uwe Ohler,et al.  Automatic Annotation of Spatial Expression Patterns via Sparse Bayesian Factor Models , 2011, PLoS Comput. Biol..

[45]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[46]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[47]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[48]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[49]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[50]  Trevor Darrell,et al.  Factorized Latent Spaces with Structured Sparsity , 2010, NIPS.

[51]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[52]  Michael I. Jordan,et al.  Mixtures of Probabilistic Principal Component Analyzers , 2001 .

[53]  Samuel Kaski,et al.  Bayesian CCA via Group Sparsity , 2011, ICML.

[54]  J. Cunningham,et al.  Unifying linear dimensionality reduction , 2014 .

[55]  Francis R. Bach,et al.  Structured Sparse Principal Component Analysis , 2009, AISTATS.

[56]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[57]  David B. Dunson,et al.  Generalized Beta Mixtures of Gaussians , 2011, NIPS.

[58]  Neil D. Lawrence,et al.  Manifold Relevance Determination , 2012, ICML.

[59]  A. Willsky,et al.  Sparse and low-rank matrix decompositions , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[60]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[61]  M. Stephens Dealing with label switching in mixture models , 2000 .

[62]  Ryan P. Adams,et al.  Contrastive Learning Using Spectral Methods , 2013, NIPS.

[63]  R. P. McDonald,et al.  Three common factor models for groups of variables , 1970 .

[64]  D. Dunson,et al.  Sparse Bayesian infinite factor models. , 2011, Biometrika.

[65]  Justin K. Romberg,et al.  Bayesian tree-structured image modeling using wavelet-domain hidden Markov models , 2001, IEEE Trans. Image Process..

[66]  A. O'Hagan,et al.  On Outlier Rejection Phenomena in Bayes Inference , 1979 .

[67]  Lorenz Wernisch,et al.  Factor analysis for gene regulatory networks and transcription factor activity profiles , 2007, BMC Bioinformatics.

[68]  Trevor Darrell,et al.  Factorized Orthogonal Latent Spaces , 2010, AISTATS.

[69]  Zoubin Ghahramani,et al.  Nonparametric Bayesian Sparse Factor Models with application to Gene Expression modelling , 2010, The Annals of Applied Statistics.

[70]  Shiwen Zhao,et al.  A co-module approach for elucidating drug-disease associations and revealing their molecular basis , 2012, Bioinform..

[71]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[72]  Yiming Yang,et al.  Multi-field Correlated Topic Modeling , 2009, SDM.

[73]  Ryan P. Adams,et al.  Bayesian Structured Sparsity from Gaussian Fields , 2014, 1407.2235.

[74]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[75]  Michael I. Jordan,et al.  Bayesian Nonparametric Latent Feature Models , 2011 .

[76]  Arto Klami,et al.  Polya-gamma augmentations for factor models , 2014, ACML.

[77]  Alain Baccini,et al.  CCA: An R Package to Extend Canonical Correlation Analysis , 2008 .

[78]  James G. Scott,et al.  Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction , 2022 .

[79]  M. Kowalski Sparse regression using mixed norms , 2009 .

[80]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[81]  M. Browne The maximum‐likelihood solution in inter‐battery factor analysis , 1979 .

[82]  Sang Hong Lee,et al.  Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood , 2012, Bioinform..

[83]  G. Casella,et al.  Penalized regression, standard errors, and Bayesian lassos , 2010 .

[84]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[85]  Jaeyong Lee,et al.  GENERALIZED DOUBLE PARETO SHRINKAGE. , 2011, Statistica Sinica.

[86]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[87]  M. Stephens,et al.  Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis , 2010, PLoS genetics.

[88]  Lawrence Carin,et al.  Bayesian joint analysis of heterogeneous genomics data , 2014, Bioinform..

[89]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[90]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[91]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[92]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[93]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[94]  Francis R. Bach,et al.  Sparse probabilistic projections , 2008, NIPS.

[95]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[96]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[97]  Rajesh P. N. Rao,et al.  Learning Shared Latent Structure for Image Synthesis and Robotic Imitation , 2005, NIPS.

[98]  John D. Storey,et al.  Mapping gene expression quantitative trait loci by singular value decomposition and independent component analysis , 2008, BMC Bioinformatics.

[99]  D. Tritchler,et al.  Sparse Canonical Correlation Analysis with Application to Genomic Data Integration , 2009, Statistical applications in genetics and molecular biology.

[100]  Christopher D. Brown,et al.  Integrative Modeling of eQTLs and Cis-Regulatory Elements Suggests Mechanisms Underlying Cell Type Specificity of eQTLs , 2012, PLoS genetics.

[101]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.