Efficient identification of identical-by-descent status in pedigrees with many untyped individuals

Motivation: Inference of identical-by-descent (IBD) probabilities is the key in family-based linkage analysis. Using high-density single nucleotide polymorphism (SNP) markers, one can almost always infer haplotype configurations of each member in a family given all individuals being typed. Consequently, the IBD status can be obtained directly from haplotype configurations. However, in reality, many family members are not typed due to practical reasons. The problem of IBD/haplotype inference is much harder when treating untyped individuals as missing. Results: We present a novel hidden Markov model (HMM) approach to infer the IBD status in a pedigree with many untyped members using high-density SNP markers. We introduce the concept of inheritance-generating function, defined for any pair of alleles in a descent graph based on a pedigree structure. We derive a recursive formula for efficient calculation of the inheritance-generating function. By aggregating all possible inheritance patterns via an explicit representation of the number and lengths of all possible paths between two alleles, the inheritance-generating function provides a convenient way to theoretically derive the transition probabilities of the HMM. We further extend the basic HMM to incorporate population linkage disequilibrium (LD). Pedigree-wise IBD sharing can be constructed based on pair-wise IBD relationships. Compared with traditional approaches for linkage analysis, our new model can efficiently infer IBD status without enumerating all possible genotypes and transmission patterns of untyped members in a family. Our approach can be reliably applied on large pedigrees with many untyped members, and the inferred IBD status can be used for non-parametric genome-wide linkage analysis. Availability: The algorithm is implemented in Matlab and is freely available upon request. Contact: jingli@cwru.edu Supplementary information: Supplementary data are available on Bioinformatics online.

[1]  K Lange,et al.  Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics. , 1996, American journal of human genetics.

[2]  L Kruglyak,et al.  Parametric and nonparametric linkage analysis: a unified multipoint approach. , 1996, American journal of human genetics.

[3]  P. Visscher,et al.  Calculation of IBD probabilities with dense SNP or sequence data , 2008, Genetic epidemiology.

[4]  G Karigl,et al.  A recursive algorithm for the calculation of identity coefficients , 1981, Annals of human genetics.

[5]  Xin Li,et al.  An Almost Linear Time Algorithm for a General Haplotype Solution on Tree Pedigrees with no Recombination and its Extensions , 2009, J. Bioinform. Comput. Biol..

[6]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[7]  Anna Ingolfsdottir,et al.  Allegro version 2 , 2005, Nature Genetics.

[8]  Xin Li,et al.  Detecting Genome-wide Haplotype Polymorphism by Combined Use of Mendelian Constraints and Local Population Structure , 2010, Pacific Symposium on Biocomputing.

[9]  E. Thompson Pedigree Analysis in Human Genetics , 1985 .

[10]  G. Abecasis,et al.  Merlin—rapid analysis of dense genetic maps using sparse gene flow trees , 2002, Nature Genetics.

[11]  R. Elston,et al.  A general model for the genetic analysis of pedigree data. , 1971, Human heredity.

[12]  Sewall Wright,et al.  Coefficients of Inbreeding and Relationship , 1922, The American Naturalist.

[13]  S. Saccaro,et al.  Contents Vol. 21, 2001 , 2001, American Journal of Nephrology.

[14]  G. Abecasis,et al.  Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers. , 2005, American journal of human genetics.

[15]  E. Lander,et al.  Construction of multilocus genetic linkage maps in humans. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Christopher Meek,et al.  Speeding up HMM algorithms for genetic linkage analysis via chain reductions of the state space , 2009, Bioinform..