Variant association tools for quality control and analysis of large-scale sequence and genotyping array data.

Currently there is great interest in detecting associations between complex traits and rare variants. In this report, we describe Variant Association Tools (VAT) and the VAT pipeline, which implements best practices for rare-variant association studies. Highlights of VAT include variant-site and call-level quality control (QC), summary statistics, phenotype- and genotype-based sample selection, variant annotation, selection of variants for association analysis, and a collection of rare-variant association methods for analyzing qualitative and quantitative traits. The association testing framework for VAT is regression based, which readily allows for flexible construction of association models with multiple covariates and weighting themes based on allele frequencies or predicted functionality. Additionally, pathway analyses, conditional analyses, and analyses of gene-gene and gene-environment interactions can be performed. VAT is capable of rapidly scanning through data by using multi-process computation, adaptive permutation, and simultaneously conducting association analysis via multiple methods. Results are available in text or graphic file formats and additionally can be output to relational databases for further annotation and filtering. An interface to R language also facilitates user implementation of novel association methods. The VAT's data QC and association-analysis pipeline can be applied to sequence, imputed, and genotyping array, e.g., "exome chip," data, providing a reliable and reproducible computational environment in which to analyze small- to large-scale studies with data from the latest genotyping and sequencing technologies. Application of the VAT pipeline is demonstrated through analysis of data from the 1000 Genomes project.

[1]  Wei Pan,et al.  Comparison of statistical tests for disease association with rare variants , 2011, Genetic epidemiology.

[2]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[3]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[4]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[5]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[6]  Seunggeun Lee,et al.  General framework for meta-analysis of rare variants in sequencing association studies. , 2013, American journal of human genetics.

[7]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[8]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[9]  S A Forbes,et al.  The Catalogue of Somatic Mutations in Cancer (COSMIC) , 2008, Current protocols in human genetics.

[10]  Jay Shendure,et al.  Single-nucleotide evolutionary constraint scores highlight disease-causing mutations , 2010, Nature Methods.

[11]  G. Abecasis,et al.  A note on exact tests of Hardy-Weinberg equilibrium. , 2005, American journal of human genetics.

[12]  Wei Pan,et al.  A Data-Adaptive Sum Test for Disease Association with Multiple Common or Rare Variants , 2010, Human Heredity.

[13]  Bo Peng,et al.  Integrated annotation and analysis of genetic variants from next-generation sequencing studies with variant tools , 2012, Bioinform..

[14]  Shuang Wang,et al.  A Fast and Noise‐Resilient Approach to Detect Rare‐Variant Associations With Deep Sequencing Data for Complex Disorders , 2012, Genetic epidemiology.

[15]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[16]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[17]  V. Salomaa,et al.  Excess of rare variants in genes identified by genome-wide association study of hypertriglyceridemia , 2010, Nature Genetics.

[18]  Shamil R Sunyaev,et al.  Pooled association tests for rare variants in exon-resequencing studies. , 2010, American journal of human genetics.

[19]  Yun Li,et al.  METAL: fast and efficient meta-analysis of genomewide association scans , 2010, Bioinform..

[20]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[21]  Gaurav Bhatia,et al.  A Covering Method for Detecting Genetic Associations between Rare Variants and Common Phenotypes , 2010, PLoS Comput. Biol..

[22]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[23]  Dan-Yu Lin,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011, American journal of human genetics.

[24]  E. Zeggini,et al.  An Evaluation of Statistical Approaches to Rare Variant Analysis in Genetic Association Studies , 2009, Genetic epidemiology.

[25]  Yurii S. Aulchenko,et al.  The Empirical Power of Rare Variant Association Methods: Results from Sanger Sequencing in 1,998 Individuals , 2012, PLoS genetics.

[26]  Suzanne M Leal,et al.  Detection of genotyping errors and pseudo‐SNPs via deviations from Hardy‐Weinberg equilibrium , 2005, Genetic epidemiology.

[27]  Dajiang J. Liu,et al.  Meta-Analysis of Gene Level Tests for Rare Variant Association , 2013, Nature Genetics.

[28]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[29]  Gao T. Wang,et al.  Testing for Rare Variant Associations in the Presence of Missing Data , 2013, Genetic epidemiology.

[30]  K. Lunetta,et al.  Methods in Genetics and Clinical Interpretation Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium Design of Prospective Meta-Analyses of Genome-Wide Association Studies From 5 Cohorts , 2010 .

[31]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[32]  Jennifer G. Robinson,et al.  Association of low-frequency and rare coding-sequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks. , 2014, American journal of human genetics.

[33]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[34]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[35]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[36]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[37]  D. Jackson,et al.  Exome Sequencing Reveals Comprehensive Genomic Alterations across Eight Cancer Cell Lines , 2011, PloS one.

[38]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[39]  Suzanne M. Leal,et al.  A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions , 2010, PLoS genetics.

[40]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[41]  H. Muller The American Journal of Human Genetics Vol . 2 No . 2 June 1950 Our Load of Mutations 1 , 2006 .

[42]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[43]  Hongyu Zhao,et al.  Rare independent mutations in renal salt handling genes contribute to blood pressure variation , 2008, Nature Genetics.

[44]  Aleksandar Milosavljevic,et al.  An integrative variant analysis suite for whole exome next-generation sequencing data , 2012, BMC Bioinformatics.

[45]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[46]  Jonathan C. Cohen,et al.  Multiple Rare Alleles Contribute to Low Plasma Levels of HDL Cholesterol , 2004, Science.

[47]  C. Greenwood,et al.  Empirical power of very rare variants for common traits and disease: results from sanger sequencing 1998 individuals , 2013, European Journal of Human Genetics.

[48]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences: current status, policy and new initiatives , 2008, Nucleic Acids Res..

[49]  K. Roeder,et al.  Unbiased methods for population‐based association studies , 2001, Genetic epidemiology.

[50]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[51]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[52]  Iuliana Ionita-Laza,et al.  A New Testing Strategy to Identify Rare Variants with Either Risk or Protective Effect on Disease , 2011, PLoS genetics.

[53]  Jason Flannick,et al.  Evaluating empirical bounds on complex disease genetic architecture , 2013, Nature Genetics.

[54]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[55]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[56]  Christian Fuchsberger,et al.  Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion , 2012, Nature Genetics.