Bayesian non-parametric models for regional prevalence estimation

We developed a flexible non-parametric Bayesian model for regional disease-prevalence estimation based on cross-sectional data that are obtained from several subpopulations or clusters such as villages, cities, or herds. The subpopulation prevalences are modeled with a mixture distribution that allows for zero prevalence. The distribution of prevalences among diseased subpopulations is modeled as a mixture of finite Polya trees. Inferences can be obtained for (1) the proportion of diseased subpopulations in a region, (2) the distribution of regional prevalences, (3) the mean and median prevalence in the region, (4) the prevalence of any sampled subpopulation, and (5) predictive distributions of prevalences for regional subpopulations not included in the study, including the predictive probability of zero prevalence. We focus on prevalence estimation using data from a single diagnostic test, but we also briefly discuss the scenario where two conditionally dependent (or independent) diagnostic tests are used. Simulated data demonstrate the utility of our non-parametric model over parametric analysis. An example involving brucellosis in cattle is presented.

[1]  María-Gloria Basáñez,et al.  Prediction of community prevalence of human onchocerciasis in the Amazonian onchocerciasis focus: Bayesian approach. , 2003, Bulletin of the World Health Organization.

[2]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[3]  David A. Freedman,et al.  Invariants Under Mixing Which Generalize de Finetti's Theorem: Continuous Time Parameter , 1963 .

[4]  M. Lavine More Aspects of Polya Tree Distributions for Statistical Modelling , 1992 .

[5]  L. Joseph,et al.  Bayesian Approaches to Modeling the Conditional Dependence Between Multiple Diagnostic Tests , 2001, Biometrics.

[6]  W. Sudderth,et al.  Polya Trees and Random Distributions , 1992 .

[7]  Wesley O. Johnson,et al.  Hierarchical models for estimating herd prevalence and test accuracy in the absence of a gold standard , 2003 .

[8]  S. Geisser,et al.  A Predictive Approach to Model Selection , 1979 .

[9]  D. Blackwell Discreteness of Ferguson Selections , 1973 .

[10]  T. Ferguson Prior Distributions on Spaces of Probability Measures , 1974 .

[11]  J. Fabius Asymptotic behavior of bayes' estimates , 1963 .

[12]  Wesley O. Johnson,et al.  Determining the infection status of a herd , 2003 .

[13]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[14]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[15]  Bradley P. Carlin,et al.  BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS , 1996, Stat. Comput..

[16]  W. Johnson,et al.  Modeling Regression Error With a Mixture of Polya Trees , 2002 .

[17]  I A Gardner,et al.  Bayesian modeling of animal- and herd-level prevalences. , 2004, Preventive veterinary medicine.

[18]  D. Freedman On the Asymptotic Behavior of Bayes' Estimates in the Discrete Case , 1963 .

[19]  H Stryhn,et al.  Conditional dependence between tests affects the diagnosis and surveillance of animal diseases. , 2000, Preventive veterinary medicine.

[20]  Wesley O. Johnson,et al.  Correlation‐adjusted estimation of sensitivity and specificity of two diagnostic tests , 2003 .

[21]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[22]  T. Hanson Inference for Mixtures of Finite Polya Tree Models , 2006 .

[23]  Wesley O Johnson,et al.  Diagnostic test accuracy and prevalence inferences based on joint and sequential testing with finite population sampling , 2004, Statistics in medicine.

[24]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[25]  Alan E. Gelfand,et al.  A Computational Approach for Full Nonparametric Bayesian Inference Under Dirichlet Process Mixture Models , 2002 .