Model-based approach for household clustering with mixed scale variables

The Ministry of Social Development in Mexico is in charge of creating and assigning social programmes targeting specific needs in the population for the improvement of the quality of life. To better target the social programmes, the Ministry is aimed to find clusters of households with the same needs based on demographic characteristics as well as poverty conditions of the household. Available data consists of continuous, ordinal, and nominal variables, all of which come from a non-i.i.d complex design survey sample. We propose a Bayesian nonparametric mixture model that jointly models a set of latent variables, as in an underlying variable response approach, associated to the observed mixed scale data and accommodates for the different sampling probabilities. The performance of the model is assessed via simulated data. A full analysis of socio-economic conditions in households in the Mexican State of Mexico is presented.

[1]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[2]  J. Kingman Random Discrete Distributions , 1975 .

[3]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[4]  Fernando A. Quintana,et al.  Some issues in nonparametric Bayesian modeling using species sampling models , 2008 .

[5]  Antonio Canale,et al.  Non‐parametric spatial models for clustered ordered periodontal data , 2016, Journal of the Royal Statistical Society. Series C, Applied statistics.

[6]  P. Müller,et al.  Assessing Toxicities in a Clinical Trial: Bayesian Inference for Ordinal Data Nested within Categories , 2010, Biometrics.

[7]  B. S. Everitt,et al.  A finite mixture model for the clustering of mixed-mode data , 1988 .

[8]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[9]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[10]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[11]  Alan E. Gelfand,et al.  Nonparametric Bayesian Modeling for Stochastic Order , 2001 .

[12]  Stephen G. Walker,et al.  Univariate Bayesian nonparametric mixture modeling with unimodal kernels , 2014, Stat. Comput..

[13]  Michael,et al.  On a Class of Bayesian Nonparametric Estimates : I . Density Estimates , 2008 .

[14]  P. Müller,et al.  Nonparametric Bayesian Modeling for Multivariate Ordinal Data , 2005 .

[15]  A. Norets,et al.  Bayesian modeling of joint and conditional distributions , 2012 .

[16]  Bayesian multivariate mixed-scale density estimation , 2015 .

[17]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[18]  D. B. Dahl Bayesian Inference for Gene Expression and Proteomics: Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[19]  Daniel Fernández Martínez Mixture-based Clustering for the Ordered Stereotype Model , 2015 .

[20]  A. Lijoi,et al.  Modeling with normalized random measure mixture models , 2013, 1310.0260.

[21]  P. Müller,et al.  10 Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[22]  G. Dunn Complex surveys. , 1996, Statistical methods in medical research.

[23]  Zoubin Ghahramani,et al.  Bayesian Cluster Analysis: Point Estimation and Credible Balls (with Discussion) , 2015, Bayesian Analysis.

[24]  Damien McParland,et al.  CLUSTERING SOUTH AFRICAN HOUSEHOLDS BASED ON THEIR ASSET STATUS USING LATENT VARIABLE MODELS. , 2014, The annals of applied statistics.

[25]  Richard Arnold,et al.  Multivariate methods using mixtures: Correspondence analysis, scaling and pattern-detection , 2014, Comput. Stat. Data Anal..

[26]  J. Boltvinik,et al.  Medición multidimensional de la pobreza en México , 2010 .

[27]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[28]  Antonio Canale,et al.  Bayesian Kernel Mixtures for Counts , 2011, Journal of the American Statistical Association.

[29]  Stanley P. Azen,et al.  Computational Statistics and Data Analysis (CSDA) , 2006 .

[30]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[31]  Antonio Canale,et al.  Bayesian nonparametric location–scale–shape mixtures , 2013, 1311.7582.

[32]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[33]  Alberto Contreras-Cristán,et al.  A Bayesian Nonparametric Approach for Time Series Clustering , 2014 .

[34]  Robert Chambers,et al.  Analysis of survey data , 2003 .

[35]  Xin-Yuan Song,et al.  A mixture of generalized latent variable models for mixed mode and heterogeneous data , 2011, Comput. Stat. Data Anal..

[36]  Xiao-Li Meng,et al.  Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage , 2000 .

[37]  M. Tanner,et al.  Facilitating the Gibbs Sampler: The Gibbs Stopper and the Griddy-Gibbs Sampler , 1992 .