The Weighting is the Hardest Part: On the Behavior of the Likelihood Ratio Test and the Score Test Under a Data-Driven Weighting Scheme in Sequenced Samples

Sequence-based association studies are at a critical inflexion point with the increasing availability of exome-sequencing data. A popular test of association is the sequence kernel association test (SKAT). Weights are embedded within SKAT to reflect the hypothesized contribution of the variants to the trait variance. Because the true weights are generally unknown, and so are subject to misspecification, we examined the efficiency of a data-driven weighting scheme. We propose the use of a set of theoretically defensible weighting schemes, of which, we assume, the one that gives the largest test statistic is likely to capture best the allele frequency–functional effect relationship. We show that the use of alternative weights obviates the need to impose arbitrary frequency thresholds. As both the score test and the likelihood ratio test (LRT) may be used in this context, and may differ in power, we characterize the behavior of both tests. The two tests have equal power, if the weights in the set included weights resembling the correct ones. However, if the weights are badly specified, the LRT shows superior power (due to its robustness to misspecification). With this data-driven weighting procedure the LRT detected significant signal in genes located in regions already confirmed as associated with schizophrenia — the PRRC2A (p = 1.020e-06) and the VARS2 (p = 2.383e-06) — in the Swedish schizophrenia case-control cohort of 11,040 individuals with exome-sequencing data. The score test is currently preferred for its computational efficiency and power. Indeed, assuming correct specification, in some circumstances, the score test is the most powerful test. However, LRT has the advantageous properties of being generally more robust and more powerful under weight misspecification. This is an important result given that, arguably, misspecified models are likely to be the rule rather than the exception in weighting-based approaches.

[1]  Giulio Genovese,et al.  Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia , 2016, Nature Neuroscience.

[2]  Timothy R. Brick,et al.  OpenMx 2.0: Extended Structural Equation and Statistical Modeling , 2015, Psychometrika.

[3]  Dan-Yu Lin,et al.  Meta-analysis for Discovering Rare-Variant Associations: Statistical Methods and Software Programs. , 2015, American journal of human genetics.

[4]  D. Boomsma,et al.  Mendelian and polygenic inheritance of intelligence: A common set of causal genes? Using next-generation sequencing to examine the effects of 168 intellectual disability genes on normal-range intelligence , 2015 .

[5]  Ilya Shlyakhter,et al.  Cosi2 : An efficient simulator of exact and approximate coalescent with selection , 2014, bioRxiv.

[6]  Ting Wang,et al.  Likelihood Ratio Tests in Rare Variant Detection for Continuous Phenotypes , 2014, Annals of human genetics.

[7]  Stephan J Sanders,et al.  A framework for the interpretation of de novo mutation in human disease , 2014, Nature Genetics.

[8]  David Heckerman,et al.  Greater power and computational efficiency for kernel-based association testing of sets of genetic variants , 2014, Bioinform..

[9]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[10]  T. Axenovich,et al.  FFBSKAT: Fast Family-Based Sequence Kernel Association Test , 2014, PloS one.

[11]  E. Zeggini,et al.  Estimating Genome-Wide Significance for Whole-Genome Sequencing Studies , 2014, Genetic epidemiology.

[12]  Jennifer G. Robinson,et al.  Association of low-frequency and rare coding-sequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks. , 2014, American journal of human genetics.

[13]  Erick R. Scott,et al.  Rare coding variants in the phospholipase D3 gene confer risk for Alzheimer’s disease , 2013, Nature.

[14]  Eric S. Lander,et al.  A polygenic burden of rare disruptive mutations in schizophrenia , 2014, Nature.

[15]  E. Banks,et al.  De novo mutations in schizophrenia implicate synaptic networks , 2014, Nature.

[16]  M. Daly,et al.  Searching for missing heritability: Designing rare variant association studies , 2014, Proceedings of the National Academy of Sciences.

[17]  Søren Brunak,et al.  Whole-exome sequencing of 2,000 Danish individuals and the role of rare coding variants in type 2 diabetes. , 2013, American journal of human genetics.

[18]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[19]  Margaret A. Pericak-Vance,et al.  Identification of a Rare Coding Variant in Complement 3 Associated with Age-related Macular Degeneration , 2013, Nature Genetics.

[20]  Simon C. Potter,et al.  Genome-wide Association Analysis Identifies 14 New Risk Loci for Schizophrenia , 2013, Nature Genetics.

[21]  Iuliana Ionita-Laza,et al.  Sequence kernel association tests for the combined effect of rare and common variants. , 2013, American journal of human genetics.

[22]  D. Rujescu,et al.  A comprehensive family-based replication study of schizophrenia genes. , 2013, JAMA psychiatry.

[23]  J. Meigs,et al.  Sequence Kernel Association Test for Quantitative Traits in Family Samples , 2013, Genetic epidemiology.

[24]  Christian Fuchsberger,et al.  Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion , 2012, Nature Genetics.

[25]  David Heckerman,et al.  A powerful and efficient set test for genetic markers that handles confounders , 2012, Bioinform..

[26]  Doug Speed,et al.  Improved heritability estimation from genome-wide SNPs. , 2012, American journal of human genetics.

[27]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.

[28]  Kenny Q. Ye,et al.  De Novo Gene Disruptions in Children on the Autistic Spectrum , 2012, Neuron.

[29]  Hongzhe Li,et al.  A powerful test for multiple rare variants association studies that incorporates sequencing qualities , 2012, Nucleic acids research.

[30]  D. Licatalosi,et al.  FMRP Stalls Ribosomal Translocation on mRNAs Linked to Synaptic Function and Autism , 2011, Cell.

[31]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[32]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[33]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[34]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[35]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[36]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[37]  Shamil R Sunyaev,et al.  Pooled association tests for rare variants in exon-resequencing studies. , 2010, American journal of human genetics.

[38]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[39]  K. Frazer,et al.  Common vs. rare allele hypotheses for complex diseases. , 2009, Current opinion in genetics & development.

[40]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[41]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[42]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[43]  P. Keightley,et al.  A Comparison of Models to Infer the Distribution of Fitness Effects of New Mutations , 2013, Genetics.

[44]  Shamil R Sunyaev,et al.  Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. , 2007, American journal of human genetics.

[45]  Jonathan C. Cohen,et al.  Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. , 2006, The New England journal of medicine.

[46]  Eric Boerwinkle,et al.  Sequence Variations in PCSK 9 , Low LDL , and Protection against Coronary Heart Disease , 2006 .

[47]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[48]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[49]  P. Allebeck,et al.  Young cases of schizophrenia identified in a national inpatient register , 2002, Social Psychiatry and Psychiatric Epidemiology.

[50]  J. Pritchard Are rare variants responsible for susceptibility to complex diseases? , 2001, American journal of human genetics.

[51]  A. Thapar,et al.  Methodology for Genetic Studies of Twins and Families , 1993 .

[52]  A. Thapar Methodology for Genetic Studies of Twins and Families , 1993 .

[53]  P. Allebeck,et al.  Validity of the diagnosis schizophrenia in a psychiatric inpatient register: A retrospective application of DSM-III criteria on ICD-8 diagnoses in Stockholm county , 1987 .

[54]  Kenneth Mather,et al.  Introduction to biometrical genetics , 1977 .