Examining the Association between Deprivation Profiles and Air Pollution in Greater London using Bayesian Dirichlet Process Mixture Models

Standard regression analyses are often plagued with problems encountered when one tries to make inference going beyond main effects, using datasets that contain dozens of variables that are potentially correlated. This situation arises, for example, in environmental deprivation studies, where a large number of deprivation scores are used as covariates, yielding a potentially unwieldy set of interrelated data from which teasing out the joint effect of multiple deprivation indices is difficult. We propose a method, based on Dirichlet-process mixture models that addresses these problems by using, as its basic unit of inference, a profile formed from a sequence of continuous deprivation measures. These deprivation profiles are clustered into groups and associated via a regression model to an air pollution outcome. The Bayesian clustering aspect of the proposed modeling framework has a number of advantages over traditional clustering approaches in that it allows the number of groups to vary, uncovers clusters and examines their association with an outcome of interest and fits the model as a unit, allowing a region’s outcome potentially to influence cluster membership. The method is demonstrated with an analysis UK Indices of Deprivation and PM10 exposure measures corresponding to super output areas (SOA’s) in greater London.

[1]  C. Dibben,et al.  The English indices of deprivation 2004 , 2011 .

[2]  J. J. Abellán,et al.  Environmental inequity in England: small area associations between socio-economic status and environmental pollution. , 2008, Social science & medicine.

[3]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[4]  Anne Lohrli Chapman and Hall , 1985 .

[5]  D. B. Dahl Bayesian Inference for Gene Expression and Proteomics: Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[6]  P. Green,et al.  Modelling Heterogeneity With and Without the Dirichlet Process , 2001 .

[7]  P. Brown,et al.  Race, class, and environmental health: a review and systematization of the literature. , 1995, Environmental research.

[8]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[9]  Sylvia Richardson,et al.  Bayesian profile regression with an application to the National Survey of Children's Health. , 2010, Biostatistics.

[10]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[11]  Mario Medvedovic,et al.  Bayesian infinite mixture model based clustering of gene expression profiles , 2002, Bioinform..

[12]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[13]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .

[14]  Sylvia Richardson,et al.  Bayesian analysis of the multivariate geographical distribution of the socio‐economic environment in England , 2007 .