Genetic association testing using the GENESIS R/Bioconductor package

SUMMARY The Genomic Data Storage (GDS) format provides efficient storage and retrieval of genotypes measured by microarrays and sequencing. We developed GENESIS to perform various single- and aggregate-variant association tests using genotype data stored in GDS format. GENESIS implements highly flexible mixed models, allowing for different link functions, multiple variance components, and phenotypic heteroskedasticity. GENESIS integrates cohesively with other R/Bioconductor packages to build a complete genomic analysis workflow entirely within the R environment. AVAILABILITY AND IMPLEMENTATION https://bioconductor.org/packages/GENESIS; vignettes included. SUPPLEMENTARY INFORMATION Supplementary tables and figures are available at Bioinformatics online.

[1]  S. Redline,et al.  Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models. , 2016, American journal of human genetics.

[2]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[3]  Bruce S Weir,et al.  Model-free Estimation of Recent Genetic Relatedness. , 2016, American journal of human genetics.

[4]  David Levine,et al.  SeqArray—a storage‐efficient high‐performance data format for WGS variant calls , 2017, Bioinform..

[5]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[6]  Wei Wang,et al.  Acetylome Analysis Reveals Population Differentiation of the Pacific Oyster Crassostrea gigas in Response to Heat Stress , 2020, Marine Biotechnology.

[7]  Seunggeun Lee,et al.  Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole genome sequencing studies , 2018, bioRxiv.

[8]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[9]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.

[10]  Timothy A Thornton,et al.  Robust Inference of Population Structure for Ancestry Prediction and Correction of Stratification in the Presence of Relatedness , 2015, Genetic epidemiology.

[11]  Harvey Goldstein,et al.  Handbook of multilevel analysis , 2008 .

[12]  Lars G Fritsche,et al.  Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies , 2017, Nature Genetics.

[13]  J. Berkhof,et al.  Diagnostic Checks for Multilevel Models , 2008 .

[14]  David Levine,et al.  GWASTools: an R/Bioconductor package for quality control and analysis of genome-wide association studies , 2012, Bioinform..

[15]  Robin Thompson,et al.  Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models , 1995 .

[16]  Exploiting the diversity of tomato: the development of a phenotypically and genetically detailed germplasm collection , 2020, Horticulture Research.

[17]  J. Meigs,et al.  Sequence Kernel Association Test for Quantitative Traits in Family Samples , 2013, Genetic epidemiology.

[18]  Kathleen F. Kerr,et al.  Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos. , 2016, American journal of human genetics.

[19]  E A Thompson,et al.  Pedigree analysis for quantitative traits: variance components without matrix inversion. , 1990, Biometrics.

[20]  David Levine,et al.  A high-performance computing toolset for relatedness and principal component analysis of SNP data , 2012, Bioinform..

[21]  T. Lumley,et al.  FastSKAT: Sequence kernel association tests for very large sets of markers , 2018, Genetic epidemiology.

[22]  Seunggeun Lee,et al.  A Fast and Accurate Algorithm to Test for Binary Phenotypes and Its Application to PheWAS. , 2017, American journal of human genetics.