One encounters in the literature estimates of some rates of genetic and congenital disorders based on log-linear methods to model possible interactions among sources. Often the analyst chooses the simplest model consistent with the data for estimation of the size of a closed population and calculates confidence intervals on the assumption that this simple model is correct. However, despite an apparent excellent fit of the data to such a model, we note here that the resulting confidence intervals may well be misleading in that they can fail to provide an adequate coverage probability. We illustrate this with a simulation for a hypothetical population based on data reported in the literature from three sources. The simulated nominal 95 per cent confidence intervals contained the modelled population size only 30 per cent of the time. Only if external considerations justify the assumption of plausible interactions of sources would use of the simpler model's interval be justified.
[1]
V W Sidel,et al.
Capture-recapture methods for assessing the completeness of case ascertainment when using multiple information sources.
,
1974,
Journal of chronic diseases.
[2]
E. Hook,et al.
Use of Bernoulli census and log-linear methods for estimating the prevalence of spina bifida in livebirths and the completeness of vital record reports in New York State.
,
1980,
American journal of epidemiology.
[3]
R R Regal,et al.
Goodness-of-fit based confidence intervals for estimates of the size of a closed population.
,
1984,
Statistics in medicine.
[4]
M. Raghavachari,et al.
Exact and asymptotic inference for the size of a population
,
1987
.
[5]
Clifford M. Hurvich,et al.
The impact of model selection on inference in linear regression
,
1990
.