Taking population stratification into account by local permutations in rare‐variant association studies on small samples

Many methods for rare variant association studies require permutations to assess the significance of tests. Standard permutations assume that all individuals are exchangeable and do not take population stratification (PS), a known confounding factor in genetic studies, into account. We propose a novel strategy, LocPerm, in which individual phenotypes are permuted only with their closest ancestry‐based neighbors. We performed a simulation study, focusing on small samples, to evaluate and compare LocPerm with standard permutations and classical adjustment on first principal components. Under the null hypothesis, LocPerm was the only method providing an acceptable type I error, regardless of sample size and level of stratification. The power of LocPerm was similar to that of standard permutation in the absence of PS, and remained stable in different PS scenarios. We conclude that LocPerm is a method of choice for taking PS and/or small sample size into account in rare variant association studies.

[1]  J. Casanova,et al.  Taking population stratification into account by local permutations in rare-variant association studies on small samples , 2020, bioRxiv.

[2]  Gang Shi,et al.  On rare variants in principal component analysis of population stratification , 2019, BMC Genetics.

[3]  Wei Zhou,et al.  Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts , 2019, Nature Genetics.

[4]  Seunggeun Lee,et al.  Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole genome sequencing studies , 2018, bioRxiv.

[5]  R. Redon,et al.  The impact of a fine-scale population stratification on rare variant association test results , 2018, PloS one.

[6]  J. Tzeng,et al.  On the substructure controls in rare variant analysis: Principal components or variance components? , 2018, Genetic epidemiology.

[7]  Lars G Fritsche,et al.  Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies , 2017, Nature Genetics.

[8]  K. BagleyRobin,et al.  Population structure analysis using DAPC , 2016 .

[9]  Lei Shang,et al.  Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants , 2014, Proceedings of the National Academy of Sciences.

[10]  J. Novembre,et al.  Analysis of rare variant population structure in Europeans explains differential stratification of gene-based tests , 2014, European Journal of Human Genetics.

[11]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[12]  B. Neale,et al.  Statistical Properties of Single-Marker Tests for Rare Variants , 2014, Twin Research and Human Genetics.

[13]  Chad C. Brown,et al.  An adaptive permutation approach for genome-wide association study: evaluation and recommendations for use , 2014, BioData Mining.

[14]  Xiaotong Shen,et al.  Adjusting for Population Stratification in a Fine Scale With Principal Components and Sequencing Data , 2013, Genetic epidemiology.

[15]  Michael P. Epstein,et al.  Assessing the Impact of Population Stratification on Association Studies of Rare Variation , 2013, Human Heredity.

[16]  Adam Kiezun,et al.  Fine-Scale Patterns of Population Stratification Confound Rare Variant Association Tests , 2013, PloS one.

[17]  Lin S. Chen,et al.  Marbled Inflation From Population Structure in Gene‐Based Association Studies With Rare Variants , 2013, Genetic epidemiology.

[18]  Wei Pan,et al.  Adjustment for Population Stratification via Principal Components in Association Analysis of Rare Variants , 2013, Genetic epidemiology.

[19]  Michael P. Epstein,et al.  A permutation procedure to correct for confounders in case-control studies, including tests of rare variation. , 2012, American journal of human genetics.

[20]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[21]  G. McVean,et al.  Differential confounding of rare and common variants in spatially structured populations , 2011, Nature Genetics.

[22]  Lisa J. Martin,et al.  Population structure analysis using rare and common functional variants , 2011, BMC proceedings.

[23]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[24]  Hugues Aschard,et al.  Inflated type I error rates when using aggregation methods to analyze rare variants in the 1000 Genomes Project exon sequencing data in unrelated individuals: summary results from Group 7 at Genetic Analysis Workshop 17 , 2011, Genetic epidemiology.

[25]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[26]  A. Morris,et al.  Data quality control in genetic case-control association studies , 2010, Nature Protocols.

[27]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[28]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[29]  W. Thilly,et al.  A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). , 2007, Mutation research.

[30]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[31]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[32]  Phillip I. Good,et al.  Extensions Of The Concept Of Exchangeability And Their Applications , 2002 .

[33]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .