Reliable Inference in Highly Stratified Contingency Tables: Using Latent Class Models as Density Estimators

Contingency tables are among the most basic and useful techniques available for analyzing categorical data, but they produce highly imprecise estimates in small samples or for population subgroups that arise following repeated stratification. I demonstrate that preprocessing an observed set of categorical variables using a latent class model can greatly improve the quality of table-based inferences. As a density estimator, the latent class model closely approximates the underlying joint distribution of the variables of interest, which enables reliable estimation of conditional probabilities and marginal effects, even among subgroups containing fewer than 40 observations. Though here focused on applications to public opinion, the procedure has a wide range of potential uses. I illustrate the benefits of the latent class model—based approach for greatly improved accuracy in estimating and forecasting vote preferences within small demographic subgroups using survey data from the 2004 and 2008 U.S. presidential election campaigns.

[1]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[2]  R. Kominski,et al.  Risk Factors for Children in the U . S . , States , and Metropolitan Areas : Data From the 2007 American Community Survey 1-Year Estimates , 2009 .

[3]  Guan-Hua Huang Selecting the number of classes under latent class regression: a factor analytic analogue , 2005 .

[4]  Christopher H. Achen TOWARD A NEW POLITICAL METHODOLOGY: Microfoundations and ART , 2002 .

[5]  Russell D. Murphy,et al.  TOWARD A NEW POLITICAL METHODOLOGY: Microfoundations and ART , 2006 .

[6]  Nairanjana Dasgupta,et al.  Bayesian Models for Categorical Data , 2007, Technometrics.

[7]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[8]  G. Maddala Limited-dependent and qualitative variables in econometrics: Introduction , 1983 .

[9]  Herbert B. Asher Polling and the Public: What Every Citizen Should Know , 1987 .

[10]  Marisa A. Abrajano,et al.  The Hispanic Vote in the 2004 Presidential Election: Insecurity and Moral Concerns , 2008, The Journal of Politics.

[11]  Jeffrey R. Lax,et al.  How Should We Estimate Public Opinion in the States , 2009 .

[12]  J. Aitchison,et al.  Multivariate binary discrimination by the kernel method , 1976 .

[13]  William T. Gossett Electing the president , 1969 .

[14]  J. Hagenaars,et al.  Applied Latent Class Analysis , 2003 .

[15]  J. A. Calvin Regression Models for Categorical and Limited Dependent Variables , 1998 .

[16]  B. Grund Kernel estimators for cell probabilities , 1993 .

[17]  Christopher H. Achen Let's Put Garbage-Can Regressions and Garbage-Can Probits Where They Belong , 2005 .

[18]  Sheldon R. Gawiser,et al.  How Barack Obama Won: A State-by-State Guide to the Historic 2008 Presidential Election , 2009 .

[19]  D. M. Titterington,et al.  A Comparative Study of Kernel-Based Density Estimates for Categorical Data , 1980 .

[20]  Peter J. McDonough,et al.  Inter-University Consortium for Political and Social Research , 1986 .

[21]  Tim Futing Liao,et al.  Analysis of Multivariate Social Science Data , 2010 .

[22]  P. Schmidt,et al.  Limited-Dependent and Qualitative Variables in Econometrics. , 1984 .

[23]  Arno Siebes,et al.  Smoothing Categorical Data , 2012, ECML/PKDD.

[24]  Drew A. Linzer,et al.  poLCA : Polytomous Variable Latent Class Analysis Version , 2007 .

[25]  Alan Agresti,et al.  Bayesian inference for categorical data analysis , 2005, Stat. Methods Appl..

[26]  Drew A. Linzer,et al.  poLCA: An R Package for Polytomous Variable Latent Class Analysis , 2011 .

[27]  John E. Jackson An Errors-in-Variables Approach to Estimating Models with Small Area Data , 1989, Political Analysis.

[28]  David L. Leal,et al.  The Latino Vote in the 2004 Election , 2005, PS: Political Science & Politics.

[29]  William D. Berry,et al.  Testing for Interaction in Binary Logit and Probit Models: Is a Product Term Essential? , 2010 .

[30]  B. Muthén,et al.  Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study , 2007 .

[31]  Scott L. Zeger,et al.  Latent Variable Regression for Multiple Discrete Outcomes , 1997 .

[32]  L. A. Goodman The Analysis of Systems of Qualitative Variables When Some of the Variables Are Unobservable. Part I-A Modified Latent Structure Approach , 1974, American Journal of Sociology.

[33]  J. Rao Small Area Estimation , 2003 .

[34]  Danny Pfeffermann,et al.  Small Area Estimation , 2011, International Encyclopedia of Statistical Science.

[35]  J. Cortina,et al.  Are Latinos Republicans But Just Don’t Know It? , 2007 .

[36]  Jeffrey S. Simonoff,et al.  Smoothing categorical data , 1995 .

[37]  Peter Congdon,et al.  Bayesian Models for Categorical Data: Peter Congdon/Bayesian Models for Categorical Data , 2006 .

[38]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[39]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[40]  van der Ark,et al.  9. Multiple Imputation of Incomplete Categorical Data Using Latent Class Analysis , 2008 .

[41]  P. Hall On nonparametric multivariate binary discrimination , 1981 .

[42]  S. Zeger,et al.  Latent Class Model Diagnosis , 2000, Biometrics.

[43]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[44]  Andrew Gelman,et al.  Bayesian Multilevel Estimation with Poststratification: State-Level Estimates from National Polls , 2004, Political Analysis.

[45]  J. Booth,et al.  2. Random-Effects Modeling of Categorical Response Data , 2000 .

[46]  Patricia A. Berglund,et al.  Applied Survey Data Analysis , 2010 .

[47]  Malay Ghosh,et al.  Small Area Estimation: An Appraisal , 1994 .

[48]  J. S. Long,et al.  Regression Models for Categorical and Limited Dependent Variables , 1997 .