Ecological Inference: Prior and Likelihood Choices in the Analysis of Ecological Data

A general statistical framework for ecological inference is presented, and a number of previously proposed approaches are described and critiqued within this framework. In particular, the assumptions that all approaches require to overcome the fundamental nonidentifiability problem of ecological inference are clarified. We describe a number of three-stage Bayesian hierarchical models that are flexible enough to incorporate substantive prior knowledge and additional data. We illustrate that great care must be taken when specifying prior distributions, however. The choice of the likelihood function for aggregate data is discussed, and it is argued that in the case of aggregate 2 × 2 data, a choice that is consistent with a realistic sampling scheme is a convolution of binomial distributions, which naturally incorporate the bounds on the unobserved cells of the constituent 2 × 2 tables. For large marginal counts this choice is computationally daunting, and a simple normal approximation previously described by Wakefield (2004) is discussed. Various computational schemes are described, ranging from an auxiliary data scheme for tables with small counts, to Markov chain Monte Carlo algorithms that are efficient for tables with larger marginal counts. We investigate prior, likelihood, and computational choices with respect to simulated data, and also via registration–race data from four southern U.S. states.

[1]  Allan L. McCutcheon,et al.  Cross-Level Inference , 1995 .

[2]  Clive Payne,et al.  Aggregate Data, Ecological Regression, and Voting Transitions , 1986 .

[3]  J. Besag,et al.  Inference on a collapsed margin in disease mapping. , 2000, Statistics in medicine.

[4]  D. Cox,et al.  Asymptotic techniques for use in statistics , 1989 .

[5]  D. Freedman Ecological Inference and the Ecological Fallacy , 1999 .

[6]  David G Steel,et al.  Simple methods for ecological inference in 2×2 tables , 2001 .

[7]  D. Freedman,et al.  A solution to the ecological inference problem , 1997 .

[8]  R. Little,et al.  A note about models for selectivity bias. , 1985 .

[9]  Ron Johnston,et al.  Review of A Solution to the Ecological Inference Problem: Reconstructing Individual Behaviour from Aggregate Data by King, G , 1998 .

[10]  Andrew Gelman,et al.  Models, assumptions and model checking in ecological regressions , 2001 .

[11]  Jon Wakefield,et al.  Bayesian individualization via sampling-based methods , 1996, Journal of Pharmacokinetics and Biopharmaceutics.

[12]  J C Wakefield,et al.  Hierarchical models for multicentre binary response studies. , 1990, Statistics in medicine.

[13]  Gary King,et al.  Binomial-Beta Hierarchical Models for Ecological Inference , 1999 .

[14]  Alan G. Hawkes,et al.  An Approach to the Analysis of Electoral Swing , 1969 .

[15]  L. A. Goodman Ecological Regressions and Behavior of Individuals , 1953 .

[16]  Wendy K. Tam Cho,et al.  Iff the Assumption Fits…: A Comment on the King Ecological Inference Solution , 1998 .

[17]  D. Lindley,et al.  Bayes Estimates for the Linear Model , 1972 .

[18]  M. Tanner,et al.  Bayesian and Frequentist Inference for Ecological Inference: The R×C Case , 2001 .

[19]  Otis Dudley Duncan,et al.  An Alternative to Ecological Correlation , 1953 .

[20]  J. Heckman Sample selection bias as a specification error , 1979 .

[21]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[22]  J. Forster Ecological inference for 2 × 2 tables - Discussion , 2004 .

[23]  Alan E. Gelfand,et al.  Bayesian statistics without tears: A sampling-resampling perspective , 1992 .

[24]  A. Gelfand,et al.  Identifiability, Improper Priors, and Gibbs Sampling for Generalized Linear Models , 1999 .

[25]  H. Selvin Durkheim's Suicide and Problems of Empirical Research , 1958, American Journal of Sociology.