Bayesian Copula Density Deconvolution for Zero-Inflated Data in Nutritional Epidemiology

Estimating the marginal and joint densities of the long-term average intakes of different dietary components is an important problem in nutritional epidemiology. Since these variables cannot be directly measured, data are usually collected in the form of 24-hour recalls of the intakes, which show marked patterns of conditional heteroscedasticity. Significantly compounding the challenges, the recalls for episodically consumed dietary components also include exact zeros. The problem of estimating the density of the latent long-time intakes from their observed measurement error contaminated proxies is then a problem of deconvolution of densities with zero-inflated data. We propose a Bayesian semiparametric solution to the problem, building on a novel hierarchical latent variable framework that translates the problem to one involving continuous surrogates only. Crucial to accommodating important aspects of the problem, we then design a copula-based approach to model the involved joint distributions, adopting different modeling strategies for the marginals of the different dietary components. We design efficient Markov chain Monte Carlo algorithms for posterior inference and illustrate the efficacy of the proposed method through simulation experiments. Applied to our motivating nutritional epidemiology problems, compared to other approaches, our method provides more realistic estimates of the consumption patterns of episodically consumed dietary components.

[1]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[2]  Debdeep Pati,et al.  Bayesian Semiparametric Density Deconvolution in the Presence of Conditionally Heteroscedastic Measurement Errors , 2014, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[3]  Xiao-Li Meng,et al.  Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage , 2000 .

[4]  Gary K Grunwald,et al.  Analysis of repeated measures data with clumping at zero , 2002, Statistical methods in medical research.

[5]  S. Walker,et al.  Bayesian nonparametric estimation of a copula , 2015 .

[6]  R. Carroll,et al.  A new statistical method for estimating the usual intake of episodically consumed foods with application to their distribution. , 2006, Journal of the American Dietetic Association.

[7]  R. Carroll,et al.  The International Journal of Biostatistics Fitting a Bivariate Measurement Error Model for Episodically Consumed Dietary Components , 2011 .

[8]  M. Pourahmadi,et al.  Modelling structured correlation matrices , 2017 .

[9]  M. Pourahmadi,et al.  Distribution of random correlation matrices: Hyperspherical parameterization of the Cholesky factor , 2015 .

[10]  S. Kirkpatrick,et al.  Development of the Healthy Eating Index‐2010 , 2008, Journal of the American Dietetic Association.

[11]  Merrill W. Liechty,et al.  Bayesian correlation estimation , 2004 .

[12]  M. Pitt,et al.  Efficient Bayesian inference for Gaussian copula regression models , 2006 .

[13]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[14]  D. Midthune,et al.  Comparative validation of the Block, Willett, and National Cancer Institute food frequency questionnaires : the Eating at America's Table Study. , 2001, American journal of epidemiology.

[15]  S. Geisser,et al.  A Predictive Approach to Model Selection , 1979 .

[16]  S. Walker,et al.  Bayesian Nonparametric Inference for a Multivariate Copula Function , 2014 .

[17]  Hedibert Freitas Lopes,et al.  Copula, marginal distributions and model selection: a Bayesian note , 2008, Stat. Comput..

[18]  S. Kirkpatrick,et al.  Update of the Healthy Eating Index: HEI-2015. , 2018, Journal of the Academy of Nutrition and Dietetics.

[19]  Raymond J Carroll,et al.  A NEW MULTIVARIATE MEASUREMENT ERROR MODEL WITH ZERO-INFLATED DIETARY DATA, AND ITS APPLICATION TO DIETARY ASSESSMENT. , 2011, The annals of applied statistics.

[20]  D. Ruppert,et al.  Density Estimation in the Presence of Heteroscedastic Measurement Error , 2008 .

[21]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[22]  Debdeep Pati,et al.  Bayesian Semiparametric Multivariate Density Deconvolution , 2014, Journal of the American Statistical Association.

[23]  Raymond J Carroll,et al.  Modeling Data with Excess Zeros and Measurement Error: Application to Evaluating Relationships between Episodically Consumed Foods and Health Outcomes , 2009, Biometrics.

[24]  H. Joe Dependence Modeling with Copulas , 2014 .

[25]  Arkady Shemyakin,et al.  Introduction to Bayesian Estimation and Copula Models of Dependence / Arkady Shemyakin, Alexander Kniazev , 2017 .

[26]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..