Summary report: Missing data and pedigree and genotyping errors

Genetic epidemiology is faced with mapping complex traits to genes with relatively small effects whose phenotypes may be modulated by temporal factors. To do this, detailed and accurate data must be available on families, perhaps collected over time. The Framingham Heart Study data supplied to Genetic Analysis Workshop 13 (GAW13), along with its simulated counterpart, contain longitudinal measurements and genomic scan data on 2,885 individuals in 330 families, and offer an opportunity to examine data quality and completeness issues as they affect analytical conclusions. Six GAW13 contributions applied methods to deal with missing data, both phenotypic and genotypic, at a single time point and longitudinally, and with possible errors in pedigree structure and genotypes. The methods included missing phenotypic data imputation by Markov chain Monte Carlo sampling, propensity scoring, regression, and adjusted mean values, as well as the assessment of transmission‐disequilibrium tests when missing marker data may be allele‐specific. Pedigree structural errors were found by genome‐wide allele‐sharing probabilities, while Mendelian consistent genotype errors were evaluated through likelihoods of double‐recombination events. Each of the methods reviewed here offered insights into how to better take advantage of large, time‐dependent, familial data sets. However, no one of them dealt with the longitudinal and familial aspects simultaneously. Overall, more consideration needs to be given to the effects that missing data and data errors have on our ability to map complex traits efficiently and accurately. Genet Epidemiol 25 (Suppl. 1):S36–S42. © 2003 Wiley‐Liss, Inc.

[1]  Qiong Yang,et al.  Description of the Framingham Heart Study data for Genetic Analysis Workshop 13 , 2003, BMC Genetics.

[2]  D. Thomas,et al.  The role of interacting determinants in the localization of genes. , 2001, Advances in genetics.

[3]  A. Sakuntabhai,et al.  Polymeric immunoglobulin receptor polymorphisms and risk of nasopharyngeal cancer , 2003, BMC Genetics.

[4]  D. Thomas,et al.  Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. , 1996, Statistics in medicine.

[5]  E Warwick Daw,et al.  Genetic Analysis Workshop 13: Simulated longitudinal data on families for a system of oligogenic traits , 2003, BMC Genetics.

[6]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[7]  Laura Almasy,et al.  Pedigree and genotype errors in the Framingham Heart Study , 2003, BMC Genetics.

[8]  G. Abecasis,et al.  Merlin—rapid analysis of dense genetic maps using sparse gene flow trees , 2002, Nature Genetics.

[9]  E. Sheridan,et al.  Lack of involvement of known DNA methyltransferases in familial hydatidiform mole implies the involvement of other factors in establishment of imprinting in the human female germline , 2003, BMC Genetics.

[10]  J R O'Connell,et al.  PedCheck: a program for identification of genotype incompatibilities in linkage analysis. , 1998, American journal of human genetics.

[11]  L Sun,et al.  Statistical tests for detection of misspecified relationships by use of genome-screen data. , 2000, American journal of human genetics.

[12]  Mariza de Andrade,et al.  Imputation methods for missing data for polygenic models , 2003, BMC Genetics.

[13]  Bootstrap calibration of TRANSMIT for informative missingness of parental genotype data , 2003, BMC Genetics.

[14]  Comparison of missing data approaches in linkage analysis , 2003, BMC Genetics.

[15]  G. Satten,et al.  Informative missingness in genetic association studies: case-parent designs. , 2003, American journal of human genetics.

[16]  Jeanette C Papp,et al.  Detection and integration of genotyping errors in statistical genetics. , 2002, American journal of human genetics.

[17]  P. Kraft,et al.  Multiple imputation methods for longitudinal blood pressure measurements from the Framingham Heart Study , 2003, BMC Genetics.

[18]  H H Göring,et al.  Linkage analysis in the presence of errors II: marker-locus genotyping errors modeled with hypercomplex recombination fractions. , 2000, American journal of human genetics.

[19]  Gail P Jarvik,et al.  An examination of the genotyping error detection function of SIMWALK2 , 2003, BMC Genetics.

[20]  S. Heath Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. , 1997, American journal of human genetics.

[21]  D. Thomas,et al.  Analysis of gene‐smoking interaction in lung cancer , 1997, Genetic epidemiology.

[22]  Theory and Methods: Estimation in Regressive Logistic Regression Analyses of Familial Data with Missing Outcomes , 1998 .