Nonparametric Deconvolution Models

We describe nonparametric deconvolution models (NDMs), a family of Bayesian nonparametric models for collections of data in which each observation is the average over the features from heterogeneous particles. For example, these types of data are found in elections, where we observe precinct-level vote tallies (observations) of individual citizens' votes (particles) across each of the candidates or ballot measures (features), where each voter is part of a specific voter cohort or demographic (factor). Like the hierarchical Dirichlet process, NDMs rely on two tiers of Dirichlet processes to explain the data with an unknown number of latent factors; each observation is modeled as a weighted average of these latent factors. Unlike existing models, NDMs recover how factor distributions vary locally for each observation. This uniquely allows NDMs both to deconvolve each observation into its constituent factors, and also to describe how the factor distributions specific to each observation vary across observations and deviate from the corresponding global factors. We present variational inference techniques for this family of models and study its performance on simulated data and voting data from California. We show that including local factors improves estimates of global factors and provides a novel scaffold for exploring data.

[1]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[2]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[3]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[4]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[5]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[6]  H. Harman Modern factor analysis , 1961 .

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  David B. Dahl,et al.  Sequentially-Allocated Merge-Split Sampler for Conjugate and Nonconjugate Dirichlet Process Mixture Models , 2005 .

[9]  John F. Canny,et al.  GaP: a factor model for discrete data , 2004, SIGIR '04.

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[12]  Chong Wang,et al.  A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process , 2012, ArXiv.

[13]  T. Lumley,et al.  PRINCIPAL COMPONENT ANALYSIS AND FACTOR ANALYSIS , 2004, Statistical Methods for Biomedical Research.

[14]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[15]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[16]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[17]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[18]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[19]  D. Blei,et al.  The Discrete Innite Logistic Normal Distribution , 2011, 1103.4789.

[20]  Geoffrey E. Hinton,et al.  SMEM Algorithm for Mixture Models , 1998, Neural Computation.

[21]  David M. Blei,et al.  Deep Exponential Families , 2014, AISTATS.

[22]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[23]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[24]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[25]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[26]  M. Stephens,et al.  Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis , 2010, PLoS genetics.

[27]  Erik B. Sudderth,et al.  Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes , 2012, NIPS.