An Exhaustive Scan Method for SNP Main Effects and SNP × SNP Interactions Over Highly Homozygous Genomes

Genome-wide association studies (GWAS) have been a powerful tool for exploring potential relationships between single-nucleotide polymorphisms (SNPs) and biological traits. For screening out important genetic variants, it is desired to perform an exhaustive scan over a whole genome. However, this is usually a challenging and daunting task in computation, due mainly to the large number of SNPs in GWAS. In this article, we propose a computationally effective algorithm for highly homozygous genomes. Pseudo standard error (PSE) is known to be a highly efficient and robust estimator for the standard deviation of a quantitative trait. We thus develop a statistical testing procedure for determining significant SNP main effects and SNP × SNP interactions associated with a quantitative trait based on PSE. A simulation study is first conducted to evaluate its empirical size and power. It is shown that the proposed PSE-based method can generally maintain the empirical size sufficiently close to the nominal significance level. However, the power investigation indicates that the PSE-based method might lack power in identifying significant effects for low-frequency variants if their true effect sizes are not large enough. A software is provided for implementing the proposed algorithm and its computational efficiency is evaluated through another simulation study. An exhaustive scan is usually done within a very reasonable runtime and a rice genome data set is analyzed by the software.

[1]  Vineet Bafna,et al.  RAPID detection of gene-gene interactions in genome-wide association studies , 2010, Bioinform..

[2]  Ina Hoeschele,et al.  Penalized Multimarker vs. Single-Marker Regression Methods for Genome-Wide Association Studies of Quantitative Traits , 2014, Genetics.

[3]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[4]  Ioannis Xenarios,et al.  FastEpistasis: a high performance computing solution for quantitative trait epistasis , 2010, Bioinform..

[5]  Chris S. Haley,et al.  EpiGPU: exhaustive pairwise epistasis scans parallelized on consumer level graphics cards , 2011, Bioinform..

[6]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[7]  Xiang Zhang,et al.  TEAM: efficient two-locus epistasis tests in human genome-wide association study , 2010, Bioinform..

[8]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[9]  David M. Evans,et al.  Two-Stage Two-Locus Models in Genome-Wide Association , 2006, PLoS genetics.

[10]  Sagi Snir,et al.  Gene-Gene Interactions Detection Using a Two-stage Model , 2015, J. Comput. Biol..

[11]  Mark H. Wright,et al.  Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa , 2011, Nature communications.

[12]  I. Pe’er,et al.  Ultrafast genome-wide scan for SNP–SNP interactions in common complex disease , 2012, Genome research.

[13]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[14]  Attila Gyenesei,et al.  High-throughput analysis of epistasis in genome-wide association studies with BiForce , 2012, Bioinform..