Inferring rare disease risk variants based on exact probabilities of sharing by multiple affected relatives

MOTIVATION Family-based designs are regaining popularity for genomic sequencing studies because they provide a way to test cosegregation with disease of variants that are too rare in the population to be tested individually in a conventional case-control study. RESULTS Where only a few affected subjects per family are sequenced, the probability that any variant would be shared by all affected relatives-given it occurred in any one family member-provides evidence against the null hypothesis of a complete absence of linkage and association. A P-value can be obtained as the sum of the probabilities of sharing events as (or more) extreme in one or more families. We generalize an existing closed-form expression for exact sharing probabilities to more than two relatives per family. When pedigree founders are related, we show that an approximation of sharing probabilities based on empirical estimates of kinship among founders obtained from genome-wide marker data is accurate for low levels of kinship. We also propose a more generally applicable approach based on Monte Carlo simulations. We applied this method to a study of 55 multiplex families with apparent non-syndromic forms of oral clefts from four distinct populations, with whole exome sequences available for two or three affected members per family. The rare single nucleotide variant rs149253049 in ADAMTS9 shared by affected relatives in three Indian families achieved significance after correcting for multiple comparisons ([Formula: see text]). AVAILABILITY AND IMPLEMENTATION Source code and binaries of the R package RVsharing are freely available for download at http://cran.r-project.org/web/packages/RVsharing/index.html. CONTACT alexandre.bureau@msp.ulaval.ca or ingo@jhu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  M S McPeek,et al.  Optimal allele‐sharing statistics for genetic mapping using affected relatives , 1999, Genetic epidemiology.

[2]  N. Risch Linkage strategies for genetically complex traits. I. Multilocus models. , 1990, American journal of human genetics.

[3]  S. Cichon,et al.  Genome-wide meta-analyses of nonsyndromic cleft lip with or without cleft palate identify six new risk loci , 2012, Nature Genetics.

[4]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[5]  Hua Tang,et al.  Estimating kinship in admixed populations. , 2012, American journal of human genetics.

[6]  M. Southey,et al.  Design Considerations for Massively Parallel Sequencing Studies of Complex Human Disease , 2011, PloS one.

[7]  Greg Gibson,et al.  Rare and common variants: twenty arguments , 2012, Nature Reviews Genetics.

[8]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[9]  D. Goldstein,et al.  Uncovering the roles of rare variants in common disease through whole-genome sequencing , 2010, Nature Reviews Genetics.

[10]  Doug Speed,et al.  Improved heritability estimation from genome-wide SNPs. , 2012, American journal of human genetics.

[11]  Christian Gilissen,et al.  Disease gene identification strategies for exome sequencing , 2012, European Journal of Human Genetics.

[12]  T. Beaty,et al.  Confirming genes influencing risk to cleft lip with/without cleft palate in a case–parent trio study , 2013, Human Genetics.

[13]  E. Wijsman The role of large pedigrees in an era of high-throughput sequencing , 2012, Human Genetics.