Multilocus Lod Scores in Large Pedigrees: Combination of Exact and Approximate Calculations

To detect the positions of disease loci, lod scores are calculated at multiple chromosomal positions given trait and marker data on members of pedigrees. Exact lod score calculations are often impossible when the size of the pedigree and the number of markers are both large. In this case, a Markov Chain Monte Carlo (MCMC) approach provides an approximation. However, to provide accurate results, mixing performance is always a key issue in these MCMC methods. In this paper, we propose two methods to improve MCMC sampling and hence obtain more accurate lod score estimates in shorter computation time. The first improvement generalizes the block-Gibbs meiosis (M) sampler to multiple meiosis (MM) sampler in which multiple meioses are updated jointly, across all loci. The second one divides the computations on a large pedigree into several parts by conditioning on the haplotypes of some ‘key’ individuals. We perform exact calculations for the descendant parts where more data are often available, and combine this information with sampling of the hidden variables in the ancestral parts. Our approaches are expected to be most useful for data on a large pedigree with a lot of missing data.

[1]  F. Hu,et al.  A Common Genetic Variant Is Associated with Adult and Childhood Obesity , 2006, Science.

[2]  A. Goris,et al.  No evidence for association with Parkinson disease for 13 single-nucleotide polymorphisms identified by whole-genome association screening. , 2006, American journal of human genetics.

[3]  S. Heath Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. , 1997, American journal of human genetics.

[4]  K. P. Donnelly,et al.  The probability that related individuals share some section of genome identical by descent. , 1983, Theoretical population biology.

[5]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  R. Myers,et al.  Considerations for genomewide association studies in Parkinson disease. , 2006, American journal of human genetics.

[8]  M. Spence,et al.  Analysis of human genetic linkage , 1986 .

[9]  G. Abecasis,et al.  Age-Related Macular Degeneration: A High-Resolution Genome Scan for Susceptibility Loci in a Population Enriched for Late-Stage Disease , 2004 .

[10]  Alun Thomas,et al.  Multilocus linkage analysis by blocked Gibbs sampling , 2000, Stat. Comput..

[11]  E. Thompson Monte Carlo Likelihood in Genetic Mapping , 1994 .

[12]  E. Wijsman,et al.  Comparison of multipoint linkage analyses for quantitative traits in the CEPH data: parametric LOD scores, variance components LOD scores, and Bayes factors , 2007, BMC proceedings.

[13]  E. Lander,et al.  Construction of multilocus genetic linkage maps in humans. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M Silberstein,et al.  Online system for faster multipoint linkage analysis via parallel execution on thousands of personal computers. , 2006, American journal of human genetics.

[15]  R. Elston,et al.  A general model for the genetic analysis of pedigree data. , 1971, Human heredity.

[16]  K Lange,et al.  A random walk method for computing genetic location scores. , 1991, American journal of human genetics.

[17]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[18]  H. Völzke,et al.  Comment on "A Common Genetic Variant Is Associated with Adult and Childhood Obesity" , 2007, Science.

[19]  J. O’Connell,et al.  The VITESSE algorithm for rapid exact multilocus linkage analysis via genotype set–recoding and fuzzy inheritance , 1995, Nature Genetics.

[20]  J. Catanese,et al.  A case-control association study of the 12 single-nucleotide polymorphisms implicated in Parkinson disease by a recent genome scan. , 2006, American journal of human genetics.

[21]  Dan Geiger,et al.  Optimizing Exact Genetic Linkage Computations , 2004, J. Comput. Biol..

[22]  E. Wijsman,et al.  Genetic analysis of simulated oligogenic traits in nuclear and extended pedigrees: Summary of GAW10 contributions , 1997, Genetic epidemiology.

[23]  Ellen M Wijsman,et al.  Multipoint linkage analysis with many multiallelic or dense diallelic markers: Markov chain-Monte Carlo provides practical approaches for genome scans on general pedigrees. , 2006, American journal of human genetics.

[24]  C. Dina,et al.  Comment on "A Common Genetic Variant Is Associated with Adult and Childhood Obesity" , 2007, Science.

[25]  Mariza de Andrade,et al.  High-resolution whole-genome association study of Parkinson disease. , 2005, American journal of human genetics.

[26]  E. Thompson,et al.  Estimation of conditional multilocus gene identity among relatives , 1999 .

[27]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[28]  Elizabeth A. Thompson,et al.  MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES , 2004 .

[29]  M. Farrer,et al.  Genomewide association, Parkinson disease, and PARK10. , 2006, American journal of human genetics.

[30]  Sonja W. Scholz,et al.  Conflicting results regarding the semaphorin gene (SEMA5A) and the risk for Parkinson disease. , 2006, American journal of human genetics.

[31]  K Lange,et al.  Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics. , 1996, American journal of human genetics.

[32]  G. Abecasis,et al.  Merlin—rapid analysis of dense genetic maps using sparse gene flow trees , 2002, Nature Genetics.