Bayesian Gaussian Copula Factor Models for Mixed Data

Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models accommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables, the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem, we propose a novel class of Bayesian Gaussian copula factor models that decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this article are implemented in the R package bfa (available from http://stat.duke.edu/jsm38/software/bfa). Supplementary materials for this article are available online.

[1]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[2]  B. Muthén Latent variable structural equation modeling with categorical data , 1983 .

[3]  Douglass C. North,et al.  Constitutions and Commitment: The Evolution of Institutions Governing Public Choice in Seventeenth-Century England , 1989, The Journal of Economic History.

[4]  C. Genest,et al.  A semiparametric estimation procedure of dependence parameters in multivariate families of distributions , 1995 .

[5]  J. Geweke,et al.  Measuring the pricing error of the arbitrage pricing theory , 1996 .

[6]  C. Klaassen,et al.  Efficient estimation in the bivariate normal copula model: normal margins are least favourable , 1997 .

[7]  L. Ryan,et al.  Latent Variable Models for Mixed Discrete and Continuous Outcomes , 1997 .

[8]  D. Rubin,et al.  Parameter expansion to accelerate EM: The PX-EM algorithm , 1998 .

[9]  Xiao-Li Meng,et al.  Seeking efficient data augmentation schemes via conditional and marginal augmentation , 1999 .

[10]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[11]  Jun S. Liu,et al.  Parameter Expansion for Data Augmentation , 1999 .

[12]  D. Dunson,et al.  Bayesian latent variable models for clustered mixed outcomes , 2000 .

[13]  M. Knott,et al.  Generalized latent trait models , 2000 .

[14]  F. Lindskog,et al.  Multivariate extremes, aggregation and dependence in elliptical distributions , 2002, Advances in Applied Probability.

[15]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[16]  D. Dunson Dynamic Latent Trait Models for Multidimensional Longitudinal Data , 2003 .

[17]  Michael A. West,et al.  BAYESIAN MODEL ASSESSMENT IN FACTOR ANALYSIS , 2004 .

[18]  Kevin M. Quinn,et al.  Bayesian Factor Analysis for Mixed Ordinal and Continuous Responses , 2004, Political Analysis.

[19]  M. Pitt,et al.  Efficient Bayesian inference for Gaussian copula regression models , 2006 .

[20]  Peter D. Hoff Extending the rank likelihood for semiparametric copula estimation , 2006, math/0610413.

[21]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[22]  Subhashis Ghosal,et al.  Bayesian ROC curve estimation under binormality using a rank likelihood , 2009 .

[23]  David B Dunson,et al.  Default Prior Distributions and Efficient Posterior Computation in Bayesian Factor Analysis , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[24]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[25]  Xin-Yuan Song,et al.  A semiparametric Bayesian approach for structural equation models , 2010, Biometrical journal. Biometrische Zeitschrift.

[26]  D. Dunson,et al.  Bayesian Semiparametric Structural Equation Models with Latent Variables , 2010 .

[27]  David B. Dunson,et al.  Compressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds , 2010, IEEE Transactions on Signal Processing.

[28]  A. Dobra,et al.  Copula Gaussian graphical models and their application to modeling functional disability data , 2011, 1108.1680.

[29]  D. Dunson,et al.  Sparse Bayesian infinite factor models. , 2011, Biometrika.

[30]  C. Carvalho,et al.  A sparse factor analytic probit model for congressional voting patterns , 2012 .

[31]  James G. Scott,et al.  Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction , 2022 .

[32]  Jaeyong Lee,et al.  GENERALIZED DOUBLE PARETO SHRINKAGE. , 2011, Statistica Sinica.

[33]  Peter D. Hoff,et al.  Information bounds for Gaussian copulas. , 2011, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.