Multi-stage sampling in genetic epidemiology.
暂无分享,去创建一个
When data are expensive to collect, it can be cost-efficient to sample in two or more stages. In the first stage a simple random sample is drawn and then stratified according to some easily measured attribute. In each subsequent stage a random subset of previously selected units is sampled for more detailed observation, with a unit's sampling probability determined by its attributes as observed in the previous stages. These designs are useful in many medical studies; here we use them in genetic epidemiology. Two genetic studies illustrate the strengths and limitations of the approach. The first study evaluates nuclear and mitochondrial DNA in U.S. blacks. The goal is to estimate the relative contributions of white male genes and white female genes to the gene pool of African-Americans. This example shows that the Horvitz-Thompson estimators proposed for multi-stage designs can be inefficient, particularly when used with unnecessary stratification. The second example is a multi-stage study of familial prostate cancer. The goal is to gather pedigrees, blood samples and archived tissue for segregation and linkage analysis of familial prostate cancer data by first obtaining crude family data from prostate cancer cases and cancer-free controls. This second example shows the gains in efficiency from multi-stage sampling when the individual likelihood or quasilikelihood scores vary substantially across strata.