Bayesian inference on group differences in multivariate categorical data

Multivariate categorical data are common in many fields. We are motivated by election polls studies assessing evidence of changes in voters opinions with their candidates preferences in the 2016 United States Presidential primaries or caucuses. Similar goals arise routinely in several applications, but current literature lacks a general methodology which combines flexibility, efficiency, and tractability in testing for group differences in multivariate categorical data at different---potentially complex---scales. We address this goal by leveraging a Bayesian representation which factorizes the joint probability mass function for the group variable and the multivariate categorical data as the product of the marginal probabilities for the groups, and the conditional probability mass function of the multivariate categorical data, given the group membership. To enhance flexibility, we define the conditional probability mass function of the multivariate categorical data via a group-dependent mixture of tensor factorizations, thus facilitating dimensionality reduction and borrowing of information, while providing tractable procedures for computation, and accurate tests assessing global and local group differences. We compare our methods with popular competitors, and discuss improved performance in simulations and in American election polls studies.

[1]  Jeroen K. Vermunt,et al.  Latent class modeling with covariates : Two improved three-step approaches 1 , 2012 .

[2]  R. K. Brown Denominational Differences in Support for Race‐Based Policies Among White, Black, Hispanic, and Asian Americans , 2009 .

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  P. Dellaportas,et al.  Stochastic search variable selection for log-linear models , 2000 .

[5]  D. Dunson,et al.  Nonparametric Bayes Modeling of Multivariate Categorical Data , 2009, Journal of the American Statistical Association.

[6]  Marcel Croon,et al.  Estimating Latent Structure Models with Categorical Variables: One-Step Versus Three-Step Estimators , 2004, Political Analysis.

[7]  Jing Zhou,et al.  Nonparametric Bayes modeling for case control studies with many predictors , 2016, Biometrics.

[8]  A. Agresti,et al.  Modeling Clustered Ordered Categorical Data: A Survey , 2001 .

[9]  D. Sunshine Hillygus,et al.  The Evolution of Election Polling in the United States , 2011 .

[10]  M. Barreto,et al.  Measuring the level of social support using latent class analysis. , 2015, Social science research.

[11]  Tammo H. A. Bijmolt,et al.  Country and Consumer Segmentation: Multi-Level Latent Class Analysis of Financial Product Ownership , 2004 .

[12]  G G Koch,et al.  Some general methods for the analysis of categorical data in longitudinal studies. , 1988, Statistics in medicine.

[13]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[14]  Bruce M. King,et al.  Statistical Reasoning in the Behavioral Sciences , 2007 .

[15]  M. Broersma UK Election Analysis 2015: Media, Voters and the Campaign , 2015 .

[16]  David Dunson,et al.  Bayesian Factorizations of Big Sparse Tensors , 2013, Journal of the American Statistical Association.

[17]  Polls and Elections: Leviathan's Reach? The Impact of Political Consultants on the Outcomes of the 2012 Republican Presidential Primaries and Caucuses , 2015 .

[18]  Alice V. McGillivray Presidential Primaries and Caucuses, 1992: A Handbook of Election Statistics , 1992 .

[19]  Tsuyoshi Kunihama,et al.  Bayesian Modeling of Temporal Dependence in Large Sparse Contingency Tables , 2012, Journal of the American Statistical Association.

[20]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[21]  S. Finkel,et al.  Party Identification and Party Enrollment: The Difference and the Consequences , 1985, The Journal of Politics.

[22]  L. Salmaso,et al.  Permutation tests for complex data : theory, applications and software , 2010 .

[23]  A. Rinaldo,et al.  The Log-Linear Group Lasso Estimator and Its Asymptotic Properties , 2007, 0709.3526.

[24]  Jerome P. Reiter,et al.  Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence , 2014, 1410.0438.

[25]  Yufang Bian,et al.  A Latent Class Analysis of Bullies, Victims and Aggressive Victims in Chinese Adolescence: Relations with Social and School Adjustments , 2014, PloS one.

[26]  David B Dunson,et al.  TENSOR DECOMPOSITIONS AND SPARSE LOG-LINEAR MODELS. , 2014, Annals of statistics.

[27]  Lonna Rae Atkeson,et al.  The More Things Change the More They Stay the Same: Examining Gender Differences in Political Attitude Expression, 1952–2000 , 2003 .

[28]  K. Mengersen,et al.  Asymptotic behaviour of the posterior distribution in overfitted mixture models , 2011 .

[29]  David B. Dunson,et al.  Bayesian Conditional Tensor Factorizations for High-Dimensional Classification , 2013, Journal of the American Statistical Association.

[30]  D. Dunson,et al.  Shared kernel Bayesian screening , 2013, Biometrika.

[31]  Jerome P. Reiter,et al.  Categorical Data Fusion Using Auxiliary Information , 2015, 1506.05886.

[32]  Bengt Muthén,et al.  Simultaneous factor analysis of dichotomous variables in several groups , 1981 .

[33]  D. Dunson,et al.  Simplex Factor Models for Multivariate Unordered Categorical Data , 2012, Journal of the American Statistical Association.