Who Supported Obama in 2012?: Ecological Inference through Distribution Regression

We present a new solution to the ``ecological inference'' problem, of learning individual-level associations from aggregate data. This problem has a long history and has attracted much attention, debate, claims that it is unsolvable, and purported solutions. Unlike other ecological inference techniques, our method makes use of unlabeled individual-level data by embedding the distribution over these predictors into a vector in Hilbert space. Our approach relies on recent learning theory results for distribution regression, using kernel embeddings of distributions. Our novel approach to distribution regression exploits the connection between Gaussian process regression and kernel ridge regression, giving us a coherent, Bayesian approach to learning and inference and a convenient way to include prior information in the form of a spatial covariance function. Our approach is highly scalable as it relies on FastFood, a randomized explicit feature representation for kernel embeddings. We apply our approach to the challenging political science problem of modeling the voting behavior of demographic groups based on aggregate voting data. We consider the 2012 US Presidential election, and ask: what was the probability that members of various demographic groups supported Barack Obama, and how did this vary spatially across the country? Our results match standard survey-based exit polling data for the small number of states for which it is available, and serve to fill in the large gaps in this data, at a much higher degree of granularity.

[1]  Alexander J. Smola,et al.  Estimating labels from label proportions , 2008, ICML '08.

[2]  Richard Nock,et al.  (Almost) No Label No Cry , 2014, NIPS.

[3]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[4]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[5]  Barnabás Póczos,et al.  Two-stage sampled learning theory on distributions , 2015, AISTATS.

[6]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[7]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[8]  Le Song,et al.  Kernel Bayes' Rule , 2010, NIPS.

[9]  G. Wahba Spline models for observational data , 1990 .

[10]  Otis Dudley Duncan,et al.  An Alternative to Ecological Correlation , 1953 .

[11]  Sylvia Richardson,et al.  Improving ecological inference using individual‐level data , 2006, Statistics in medicine.

[12]  M. Stein,et al.  A Bayesian analysis of kriging , 1993 .

[13]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[14]  Nando de Freitas,et al.  Learning about Individuals from Group Statistics , 2005, UAI.

[15]  S. Openshaw Ecological Fallacies and the Analysis of Areal Census Data , 1984, Environment & planning A.

[16]  A. Dobson An introduction to generalized linear models , 1990 .

[17]  Barnabás Póczos,et al.  Fast Distribution To Real Regression , 2013, AISTATS.

[18]  M. C. Borja,et al.  An Introduction to Generalized Linear Models , 2009 .

[19]  D. Freedman,et al.  A solution to the ecological inference problem , 1997 .

[20]  M. Tanner,et al.  Ecological Inference: New Methodological Strategies , 2004 .

[21]  Andrew Gelman,et al.  Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do , 2008 .

[22]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[23]  Matt A. Barreto,et al.  Controversies in Exit Polling: Implementing a Racially Stratified Homogenous Precinct Approach , 2006, PS: Political Science & Politics.

[24]  Ross L. Prentice,et al.  Aggregate data studies of disease risk factors , 1995 .

[25]  Leo A. Goodman,et al.  Some Alternatives to Ecological Correlation , 1959, American Journal of Sociology.

[26]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[28]  Barnabás Póczos,et al.  Distribution-Free Distribution Regression , 2013, AISTATS.

[29]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[30]  Aki Vehtari,et al.  GPstuff: Bayesian modeling with Gaussian processes , 2013, J. Mach. Learn. Res..

[31]  W. S. Robinson,et al.  Ecological correlations and the behavior of individuals. , 1950, International journal of epidemiology.

[32]  Bernhard Schölkopf,et al.  Kernel Mean Estimation and Stein Effect , 2013, ICML.

[33]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[34]  Thomas G. Dietterich,et al.  Collective Graphical Models , 2011, NIPS.

[35]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[36]  Aki Vehtari,et al.  Laplace approximation for logistic Gaussian process density estimation and regression , 2012, 1211.0174.