Sampling in Difficult Settings: A Simulation Study Comparing Several Sampling Methods
暂无分享,去创建一个
Background:Taking a representative sample to determine prevalence of variables like disease is difficult when little is known about the target population. Several methods have been proposed, including a recent revision of the World Health Organization’s Extended Program on Immunization (EPI) surveys. The original EPI method samples towns as Primary Sampling Units (PSUs) with probability proportional to size and uses a nearest neighbour approach to sample households within PSUs. The new version samples from smaller PSUs and conducts a probability sample of households within those PSUs. Other techniques use satellite images and Global Positioning Systems to sample within towns from circles around randomly identified points (‘Circle’ method) or from randomly sampled squares in a superimposed grid (‘Square’ method). We compared these sampling methods in multiple virtual populations using computer simulation.Methods:We constructed 50 virtual populations with varying characteristics. Populations comprised about a million people across 300 towns. We created three populations with different prevalences of disease but with uniform characteristics across each population. We created a binary exposure variable and allocated disease statuses to individuals assuming different relative risks (RRs) of exposure. We simulated thirteen methods of sampling: simple random sampling; the original EPI method and variants; the Square and Circle methods; and the new EPI method. For each population, each sampling method, and each of three sample sizes per PSU (7, 15, and 30), we simulated 1,000 samples. For most sampling methods, the PSUs were towns. We conducted simulations using the same 30 PSUs and using a freshly-chosen set of PSUs. For each simulation we estimated prevalence and RRs and combined the bias and variance of the 1,000 samples to compute the Root Mean Squared Error (RMSE).Results: The Circle and Square methods produced almost identical results, so we report only the Square method results. Apart from simple random sampling, the RMSE for the Square method was almost universally best for estimating prevalence, and generally best when estimating relative risks. The revised EPI approach was worse, but generally better than the original EPI. Conclusions:The Square method is recommended as statistically optimal, unless practical considerations favour another approach.