Kullback–Leibler Distance Methods for Detecting Disease Association with Rare Variants from Sequencing Data

Because next generation sequencing technology that can rapidly genotype most genetic variations genome, there is considerable interest in investigating the effects of rare variants on complex diseases. In this paper, we propose four Kullback–Leibler distance‐based Tests (KLTs) for detecting genotypic differences between cases and controls. There are several features that set the proposed tests apart from existing ones. First, by explicitly considering and comparing the distributions of genotypes, existence of variants with opposite directional effects does not compromise the power of KLTs. Second, it is not necessary to set a threshold for rare variants as the KL definition makes it reasonable to consider rare and common variants together without worrying about the contribution from one type overshadowing the other. Third, KLTs are robust to null variants thanks to a built‐in noise fighting mechanism. Finally, correlation among variants is taken into account implicitly so the KLTs work well regardless of the underlying LD structure. Through extensive simulations, we demonstrated good performance of KLTs compared to the sum of squared score test (SSU) and optimal sequence kernel association test (SKAT‐O). Moreover, application to the Dallas Heart Study data illustrates the feasibility and performance of KLTs in a realistic setting.

[1]  Eric Boerwinkle,et al.  Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. , 2008, The Journal of clinical investigation.

[2]  Ronald M Peshock,et al.  The Dallas Heart Study: a population-based probability sample for the multidisciplinary study of ethnic differences in cardiovascular health. , 2004, The American journal of cardiology.

[3]  Wei Pan,et al.  Comparison of statistical tests for disease association with rare variants , 2011, Genetic epidemiology.

[4]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[5]  Jian Xu,et al.  Identification of a novel human doublecortin-domain-containing gene (DCDC1) expressed mainly in testis , 2003, Journal of Human Genetics.

[6]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[7]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[8]  E. Zeggini,et al.  An Evaluation of Statistical Approaches to Rare Variant Analysis in Genetic Association Studies , 2009, Genetic epidemiology.

[9]  K. Frazer,et al.  Common vs. rare allele hypotheses for complex diseases. , 2009, Current opinion in genetics & development.

[10]  Yun Li,et al.  To identify associations with rare variants, just WHaIT: Weighted haplotype and imputation-based tests. , 2010, American journal of human genetics.

[11]  M. Spitz,et al.  Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. , 2008, American journal of human genetics.

[12]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[13]  Asuman Turkmen,et al.  An Optimum Projection and Noise Reduction Approach for Detecting Rare and Common Variants Associated with Complex Diseases , 2012, Human Heredity.

[14]  Shamil R Sunyaev,et al.  Pooled association tests for rare variants in exon-resequencing studies. , 2010, American journal of human genetics.

[15]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.

[16]  Kang Ying,et al.  Identification of a novel human angiopoietin-like gene expressed mainly in heart , 2003, Journal of Human Genetics.

[17]  Amy R. Bentley,et al.  Simultaneous Analysis of Common and Rare Variants in Complex Traits: Application to SNPs (SCARVAsnp) , 2012, Bioinformatics and biology insights.

[18]  W. Bodmer,et al.  Common and rare variants in multifactorial susceptibility to common diseases , 2008, Nature Genetics.

[19]  Wei Pan,et al.  Asymptotic tests of association with multiple SNPs in linkage disequilibrium , 2009, Genetic epidemiology.

[20]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[21]  Michael P. Epstein,et al.  A permutation procedure to correct for confounders in case-control studies, including tests of rare variation. , 2012, American journal of human genetics.

[22]  Sander Kersten,et al.  Regulation of triglyceride metabolism by Angiopoietin-like proteins. , 2012, Biochimica et biophysica acta.

[23]  W. Thilly,et al.  A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). , 2007, Mutation research.