On the Bayesian analysis of population size

SUMMARY We consider the problem of estimating the total size of a population from a series of incomplete census data. We observe that inference is typically highly sensitive to the choice of model and we demonstrate how Bayesian model averaging techniques easily overcome this problem. We combine and extend the work of Madigan & York (1997) and Dellaportas & Forster (1999) using reversible jump Markov chain Monte Carlo simulation to calculate posterior model probabilities which can then be used to estimate modelaveraged statistics of interest. We provide a detailed description of the simulation procedures involved and consider a wide variety of modelling issues, such as the range of models considered, their parameterisation, both prior choice and sensitivity, and computational efficiency. We consider a detailed example concerning adolescent injuries in Pennsylvania on the basis of medical, school and survey data. In the context of this example, we discuss the relationship between posterior model probabilities and the associated information criteria values for model selection. We also discuss cost-efficiency issues with particular reference to inclusion and exclusion of sources on the grounds of cost. We consider a decision-theoretic approach, which balances the cost and accuracy of different combinations of data sources to guide future decisions on data collection.

[1]  S. Fienberg The multiple recapture census for closed populations and incomplete 2k contingency tables , 1972 .

[2]  E. Hook,et al.  Use of Bernoulli census and log-linear methods for estimating the prevalence of spina bifida in livebirths and the completeness of vital record reports in New York State. , 1980, American journal of epidemiology.

[3]  D. Edwards,et al.  A fast procedure for model search in multidimensional contingency tables , 1985 .

[4]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[5]  C. Clogg,et al.  Analysis of Sets of Two-Way Contingency Tables Using Association Models , 1989 .

[6]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[7]  A. Leyland,et al.  Estimating the population prevalence of injection drug use and infection with human immunodeficiency virus among injection drug users in Glasgow, Scotland. , 1993, American journal of epidemiology.

[8]  T J Songer,et al.  Efficiency and accuracy of disease monitoring systems: application of capture-recapture methods to injury monitoring. , 1995, American journal of epidemiology.

[9]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[10]  R R Regal,et al.  Capture-recapture methods in epidemiology: methods and limitations. , 1995, Epidemiologic reviews.

[11]  Ronald E. LaPorte,et al.  Capture-recapture and multiple-record systems estimation II: Applications in human diseases. International Working Group for Disease Monitoring and Forecasting. , 1995, American journal of epidemiology.

[12]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[13]  J. York,et al.  Bayesian methods for estimation of the size of a closed population , 1997 .

[14]  Gordon Hay,et al.  The selection from multiple data sources in epidemiological capture–recapture studies , 1997 .

[15]  D. Pauler The Schwarz criterion and related methods for normal linear models , 1998 .

[16]  Stephen P. Brooks,et al.  Markov chain Monte Carlo method and its application , 1998 .

[17]  Anthony O'Hagan,et al.  Eliciting expert beliefs in substantial practical applications , 1998 .

[18]  A. Agresti,et al.  The Use of Mixed Logit Models to Reflect Heterogeneity in Capture‐Recapture Studies , 1999, Biometrics.

[19]  S. Fienberg,et al.  Classical multilevel and Bayesian approaches to population size estimation using multiple lists , 1999 .

[20]  P. Dellaportas,et al.  Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models , 1999 .

[21]  Ruth King,et al.  Prior induction in log-linear models for general contingency table analysis , 2001 .

[22]  Issy. Bray,et al.  Application of Markov chain Monte Carlo methods to modelling birth prevalence of Down syndrome , 2002 .