Maximum Likelihood Estimation for Ascertainment Bias in Sampling Siblings

When there is a rare disease in a population, it is inefficient to take a random sample to estimate a parameter. Instead one takes a random sample of all nuclear families with the disease by ascertaining at least one affected sibling (proband) of each family. In these studies, an estimate of the proportion of siblings with the disease will be inflated. For example, studies of the issue of whether a rare disease shows an autosomal recessive pattern of inheritance, where the Mendelian segregation ratios are of interest, have been investigated for several decades. How do we correct for this ascertainment bias? Methods, primarily based on maximum likelihood estimation, are available to correct for the ascertainment bias. We show that for ascertainment bias, although maximum likelihood estimation is optimal under asymptotic theory, it can perform badly. The problem is exasperated in the situation where the proband probabilities are allowed to vary with the number of affected siblings. We use two data sets to illustrate the difficulties of maximum likelihood estimation procedure, and we use a simulation study to assess the quality of the maximum likelihood estimators.

[1]  E. Thompson Pedigree Analysis in Human Genetics , 1985 .

[2]  P. Sham Statistics in human genetics , 1997 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[5]  C. R. Rao,et al.  Weighted distributions and size-biased sampling with applications to wildlife populations and human families , 1978 .

[6]  N. Morton Genetic tests under incomplete ascertainment. , 1959, American journal of human genetics.

[7]  W. Davis,et al.  Model-based small area estimates of overweight prevalence using sample selection adjustment. , 1999, Statistics in medicine.

[8]  D. Pfeffermann,et al.  Small-Area Estimation Under Informative Probability Sampling of Areas and Within the Selected Areas , 2007 .

[9]  Kenneth Lange,et al.  Mathematical and Statistical Methods for Genetic Analysis , 1997 .

[10]  N. Bailey The estimation of the frequencies of recessives with incomplete multiple selection. , 1951, Annals of eugenics.

[11]  Robert Chambers,et al.  Limited information likelihood analysis of survey data , 1998 .

[12]  Danny Pfeffermann,et al.  Prediction of finite population totals based on the sample distribution , 2004 .

[13]  R. Nielsen,et al.  Correcting for ascertainment biases when analyzing SNP data: applications to the estimation of linkage disequilibrium. , 2003, Theoretical population biology.

[14]  R. Fisher THE EFFECT OF METHODS OF ASCERTAINMENT UPON THE ESTIMATION OF FREQUENCIES , 1934 .

[15]  Balgobin Nandram,et al.  Bayesian analysis of a two-way categorical table incorporating intraclass correlation , 2006 .

[16]  Bayesian predictive inference under informative sampling and transformation , 2006 .

[17]  Freda Kemp,et al.  Mathematical and Statistical Methods for Genetic Analysis , 2003 .