An exact, unifying framework for region-based association testing in family-based designs, including higher criticism approaches, SKATs, multivariate and burden tests

Analysis of rare variants in family-based studies remains a challenge. To perform a region/set-based association analysis of rare variants in family-based studies, we propose a general methodological framework that integrates higher criticism, maximum, SKATs, and burden approaches into the family-based association testing (FBAT) framework. Using the haplotype algorithm for FBATs to compute the conditional genotype distribution under the null hypothesis of Mendelian transmissions, virtually any association test statistics can be implemented in our approach and simulation-based or exact p-values can be computed without the need for asymptotic settings. Using simulations, we compare the features of the proposed test statistics in our framework with the existing region-based methodology for family-based studies under various scenarios. The tests of our framework outperform the existing approaches. We provide general guidelines for which scenarios, e.g., sparseness of the signals or local LD structure, which test statistic will have distinct power advantages over the others. We also illustrate our approach in an application to a whole-genome sequencing dataset with 897 asthmatic trios.

[1]  Iuliana Ionita-Laza,et al.  Rare Variant Analysis for Family-Based Design , 2013, PloS one.

[2]  Christoph Lange,et al.  A multivariate family-based association test using generalized estimating equations: FBAT-GEE. , 2003, Biostatistics.

[3]  Suzanne M. Leal,et al.  The Rare-Variant Generalized Disequilibrium Test for Association Analysis of Nuclear and Extended Pedigrees with Application to Alzheimer Disease WGS Data. , 2017, American journal of human genetics.

[4]  Wei Chen,et al.  A haplotype-based framework for group-wise transmission/disequilibrium tests for rare variant association analysis , 2015, Bioinform..

[5]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[6]  Iuliana Ionita-Laza,et al.  A genome-wide scan statistic framework for whole-genome sequence data analysis , 2019, Nature Communications.

[7]  Iuliana Ionita-Laza,et al.  Genomewide weighted hypothesis testing in family-based association studies, with an application to a 100K scan. , 2007, American journal of human genetics.

[8]  Wei-Min Chen,et al.  A generalized family-based association test for dichotomous traits. , 2009, American journal of human genetics.

[9]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[10]  Xihong Lin,et al.  The Generalized Higher Criticism for Testing SNP-Set Effects in Genetic Association Studies , 2017, Journal of the American Statistical Association.

[11]  Scott T. Weiss,et al.  A New Powerful Non-Parametric Two-Stage Approach for Testing Multiple Phenotypes in Family-Based Association Studies , 2003, Human Heredity.

[12]  Xin Xu,et al.  Family‐based tests for associating haplotypes with general phenotype data: Application to asthma genetics , 2004, Genetic epidemiology.

[13]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[14]  Xin Xu,et al.  A new multimarker test for family‐based association studies , 2007, Genetic epidemiology.

[15]  Xihong Lin,et al.  HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION. , 2013, Annals of statistics.

[16]  W. Ewens,et al.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). , 1993, American journal of human genetics.

[17]  Ingo Ruczinski,et al.  A flexible and nearly optimal sequential testing approach to randomized testing: QUICK‐STOP , 2019, Genetic epidemiology.

[18]  Scott T. Weiss,et al.  On the Analysis of Genome-Wide Association Studies in Family-Based Designs: A Universal, Robust Analysis Approach and an Application to Four Genome-Wide Association Studies , 2009, PLoS genetics.

[19]  Wei Zhou,et al.  Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts , 2019, Nature Genetics.

[20]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[21]  Scott T. Weiss,et al.  Screening and Replication using the Same Data Set: Testing Strategies for Family-Based Studies in which All Probands Are Affected , 2008, PLoS genetics.

[22]  Gao T. Wang,et al.  Erratum: The Rare-Variant Generalized Disequilibrium Test for Association Analysis of Nuclear and Extended Pedigrees with Application to Alzheimer Disease WGS Data. , 2017, American journal of human genetics.

[23]  Jay Shendure,et al.  Rare-variant extensions of the transmission disequilibrium test: application to autism exome sequence data. , 2014, American journal of human genetics.

[24]  Christoph Lange,et al.  Family‐based association tests for survival and times‐to‐onset analysis , 2004, Statistics in medicine.

[25]  Xin Xu,et al.  EFBAT: exact family-based association tests , 2007, BMC Genetics.

[26]  Iuliana Ionita-Laza,et al.  Family-based association tests for sequence data, and comparisons with population-based association tests , 2013, European Journal of Human Genetics.

[27]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[28]  Christoph Lange,et al.  A comparison of popular TDT‐generalizations for family‐based association analysis , 2019, Genetic epidemiology.

[29]  Christoph Lange,et al.  On the association analysis of genome‐sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows , 2017, Genetic epidemiology.

[30]  Christoph Lange,et al.  Family‐based tests for associating haplotypes with general phenotype data , 2018, Genetic epidemiology.

[31]  P. Sachs,et al.  SMARCAD1 ATPase activity is required to silence endogenous retroviruses in embryonic stem cells , 2019, Nature Communications.

[32]  Daniel Rabinowitz,et al.  A Unified Approach to Adjusting Association Tests for Population Admixture with Arbitrary Pedigree Structure and Arbitrary Missing Marker Information , 2000, Human Heredity.

[33]  N. Laird,et al.  Family-based designs in the age of large-scale gene-association studies , 2006, Nature Reviews Genetics.

[34]  Christoph Lange,et al.  New Powerful Approaches for Family‐based Association Tests with Longitudinal Measurements , 2009, Annals of human genetics.

[35]  Christoph Lange,et al.  Genomic screening and replication using the same data set in family-based association testing , 2005, Nature Genetics.