Statistical Considerations in the Analysis of Rare Variants

Recently, whole-genome and whole-exome sequencing has begun to demonstrate success in the identification of disease-causing genes. Many of these genes exhibit abnormal genetic behavior and low prevalence in the population; these molecules are commonly referred to as rare variants. In this chapter, we provide an overview of rare variants and their scientific relevance in medicine and public health. We then provide a review of existing methods for association, primarily focusing on the sequence kernel association test (SKAT) and related methods. These procedures are related to kernel machines, which we will also describe. Finally, we discuss the implications of rare variants in terms of multiple testing.

[1]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[2]  M. King,et al.  Genetic Heterogeneity in Human Disease , 2010, Cell.

[3]  J. Lupski,et al.  Genomic rearrangements and sporadic disease , 2007, Nature Genetics.

[4]  Deborah A Nickerson,et al.  De novo rates and selection of large copy number variation. , 2010, Genome research.

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.

[7]  Bradley P. Coe,et al.  Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations , 2012, Nature.

[8]  Tarone Re A modified Bonferroni method for discrete data. , 1990 .

[9]  Martin D. Buhmann,et al.  Radial Basis Functions: Theory and Implementations: Preface , 2003 .

[10]  B. Maher,et al.  The case of the missing heritability , 2008 .

[11]  Michael F. Walker,et al.  De novo mutations revealed by whole-exome sequencing are strongly associated with autism , 2012, Nature.

[12]  E. Lander,et al.  On the allelic spectrum of human disease. , 2001, Trends in genetics : TIG.

[13]  Debashis Ghosh,et al.  Genomic outlier detection in high-throughput data analysis. , 2013, Methods in molecular biology.

[14]  Evan E Eichler,et al.  Properties and rates of germline mutations in humans. , 2013, Trends in genetics : TIG.

[15]  M. Rieder,et al.  Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. , 2012, American journal of human genetics.

[16]  E. Eichler,et al.  Phenotypic variability and genetic susceptibility to genomic disorders. , 2010, Human molecular genetics.

[17]  S. Lok,et al.  Increased exonic de novo mutation rate in individuals with schizophrenia , 2011, Nature Genetics.

[18]  Scott B. Selleck,et al.  Global increases in both common and rare copy number load associated with autism , 2013, Human molecular genetics.

[19]  Debashis Ghosh,et al.  Discrete Nonparametric Algorithms for Outlier Detection with Genomic Data , 2010, Journal of biopharmaceutical statistics.

[20]  Evan T. Geller,et al.  Patterns and rates of exonic de novo mutations in autism spectrum disorders , 2012, Nature.

[21]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[22]  John D. Storey A direct approach to false discovery rates , 2002 .

[23]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[24]  E. Lander,et al.  The mystery of missing heritability: Genetic interactions create phantom heritability , 2012, Proceedings of the National Academy of Sciences.

[25]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for GWAS meta-analysis , 2012, Nucleic acids research.

[26]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[27]  Peter B. Gilbert,et al.  A modified false discovery rate multiple‐comparisons procedure for discrete data, applied to human immunodeficiency virus genetics , 2005 .

[28]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[29]  Greg Gibson,et al.  Rare and common variants: twenty arguments , 2012, Nature Reviews Genetics.

[30]  José A Ferreira,et al.  The International Journal of Biostatistics The Benjamini-Hochberg Method in the Case of Discrete Test Statistics , 2011 .

[31]  J. Rosenfeld,et al.  Speech delays and behavioral problems are the predominant features in individuals with developmental delays and 16p11.2 microdeletions and microduplications , 2009, Journal of Neurodevelopmental Disorders.

[32]  Lee-Jen Wei,et al.  Pooled Association Tests for Rare Variants in Exon-Resequencing Studies , 2010 .

[33]  Dawei Liu,et al.  Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models , 2008, BMC Bioinformatics.

[34]  Cheng Cheng,et al.  Robust estimation of the false discovery rate , 2006, Bioinform..

[35]  Jianqing Fan,et al.  Journal of the American Statistical Association Estimating False Discovery Proportion under Arbitrary Covariance Dependence Estimating False Discovery Proportion under Arbitrary Covariance Dependence , 2022 .

[36]  J. Veltman,et al.  Understanding variable expressivity in microdeletion syndromes , 2010, Nature Genetics.

[37]  Alex Lewin,et al.  On fuzzy familywise error rate and false discovery rate procedures for discrete distributions , 2009 .

[38]  Xihong Lin,et al.  A powerful and flexible multilocus association test for quantitative traits. , 2008, American journal of human genetics.

[39]  Patrick F. Sullivan,et al.  Genetic architectures of psychiatric disorders: the emerging picture and its implications , 2012, Nature Reviews Genetics.

[40]  J. Pritchard,et al.  The allelic architecture of human disease genes: common disease-common variant...or not? , 2002, Human molecular genetics.

[41]  J. Ioannidis,et al.  Meta-analysis methods for genome-wide association studies and beyond , 2013, Nature Reviews Genetics.

[42]  Zheng-Zheng Tang,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011 .

[43]  Joshua M. Korn,et al.  Association between microdeletion and microduplication at 16p11.2 and autism , 2008 .

[44]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[45]  Dan Nettleton,et al.  Estimation of False Discovery Rate Using Sequential Permutation p‐Values , 2013, Biometrics.

[46]  Iuliana Ionita-Laza,et al.  Sequence kernel association tests for the combined effect of rare and common variants. , 2013, American journal of human genetics.

[47]  Nilanjan Chatterjee,et al.  Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies , 2013, Nature Genetics.

[48]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[49]  Ulrich Stephani,et al.  Genome-Wide Copy Number Variation in Epilepsy: Novel Susceptibility Loci in Idiopathic Generalized and Focal Epilepsies , 2010, PLoS genetics.

[50]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[51]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[52]  Xihong Lin,et al.  Semiparametric Regression of Multidimensional Genetic Pathway Data: Least‐Squares Kernel Machines and Linear Mixed Models , 2007, Biometrics.

[53]  Jeffrey T Leek,et al.  A general framework for multiple testing dependence , 2008, Proceedings of the National Academy of Sciences.

[54]  Santhosh Girirajan,et al.  Human copy number variation and complex genetic disease. , 2011, Annual review of genetics.

[55]  G. Wahba Spline models for observational data , 1990 .

[56]  Carlos S. Moreno,et al.  Relative Burden of Large CNVs on a Range of Neurodevelopmental Phenotypes , 2011, PLoS genetics.

[57]  Donna M. Martin,et al.  Phenotypic heterogeneity of genomic disorders and rare copy-number variants. , 2012, The New England journal of medicine.

[58]  Seunggeun Lee,et al.  General framework for meta-analysis of rare variants in sequencing association studies. , 2013, American journal of human genetics.