An efficient algorithm to perform multiple testing in epistasis screening

BackgroundResearch in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn’s disease.ResultsIn the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn’s disease (CD) data.ConclusionsOur software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn’s disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.

[1]  Tariq Ahmad,et al.  Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci , 2010, Nature Genetics.

[2]  Judy H. Cho,et al.  Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease , 2008, Nature Genetics.

[3]  S. Targan,et al.  IL23R haplotypes provide a large population attributable risk for Crohn's disease , 2008, Inflammatory bowel diseases.

[4]  Víctor Urrea Gales,et al.  MB-MDR: Model-Based Multifactor Dimensionality Reduction for detecting interactions in high-dimensional genomic data , 2008 .

[5]  Kristel Van Steen,et al.  Travelling the world of gene-gene interactions , 2012, Briefings Bioinform..

[6]  Kristel Van Steen,et al.  mbmdr: an R package for exploring gene-gene interactions associated with binary or quantitative traits , 2010, Bioinform..

[7]  Samuel A. Assefa,et al.  A Strand-Specific RNA–Seq Analysis of the Transcriptome of the Typhoid Bacillus Salmonella Typhi , 2009, PLoS genetics.

[8]  Kristel Van Steen,et al.  MB-MDR: Model-Based Multifactor Dimensionality Reduction for detecting interactions in high-dimensional genomic data , 2008 .

[9]  E. Lander,et al.  The mystery of missing heritability: Genetic interactions create phantom heritability , 2012, Proceedings of the National Academy of Sciences.

[10]  Divyakant Agrawal,et al.  CEO a cloud epistasis computing model in GWAS , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[11]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[12]  M. L. Calle,et al.  Model‐Based Multifactor Dimensionality Reduction for detecting epistasis in case–control data in the presence of noise , 2011, Annals of human genetics.

[13]  Kristel Van Steen,et al.  Model-Based Multifactor Dimensionality Reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data , 2011, European Journal of Human Genetics.

[14]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[15]  Yongchao Ge Resampling-based Multiple Testing for Microarray Data Analysis , 2003 .

[16]  P. Marks,et al.  Histone deacetylase 4 associates with extracellular signal-regulated kinases 1 and 2, and its cellular localization is regulated by oncogenic Ras. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Elena S. Gusareva,et al.  Lower-Order Effects Adjustment in Quantitative Traits Model-Based Multifactor Dimensionality Reduction , 2012, PloS one.

[18]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[19]  Fabio Cancare,et al.  Accelerating epistasis analysis in human genetics with consumer graphics hardware , 2009, BMC Research Notes.

[20]  M. L. Calle,et al.  Improving strategies for detecting genetic patterns of disease susceptibility in association studies , 2008, Statistics in medicine.

[21]  Simon Heath,et al.  Novel Crohn Disease Locus Identified by Genome-Wide Association Maps to a Gene Desert on 5p13.1 and Modulates Expression of PTGER4 , 2007, PLoS genetics.

[22]  Marylyn D. Ritchie,et al.  Pacific Symposium on Biocomputing 14:368-379 (2009) BIOFILTER: A KNOWLEDGE-INTEGRATION SYSTEM FOR THE MULTI-LOCUS ANALYSIS OF GENOME-WIDE ASSOCIATION STUDIES * , 2022 .

[23]  M. Daly,et al.  Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions , 2009, PLoS genetics.

[24]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[25]  A. Singleton,et al.  Genomewide association studies and human disease. , 2009, The New England journal of medicine.

[26]  S. Dalal,et al.  The Role of MicroRNA in Inflammatory Bowel Disease. , 2010, Gastroenterology & hepatology.

[27]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[28]  Dimitris Anastassiou,et al.  Synergy Disequilibrium Plots: graphical visualization of pairwise synergies and redundancies of SNPs with respect to a phenotype , 2009, Bioinform..

[29]  M. L. Calle,et al.  FAM-MDR: A Flexible Family-Based Multifactor Dimensionality Reduction Technique to Detect Epistasis Using Related Individuals , 2010, PloS one.

[30]  E. Szigethy,et al.  Inflammatory bowel disease. , 2011, Pediatric clinics of North America.

[31]  C. Abraham,et al.  Inflammatory disease protective R381Q IL23 receptor polymorphism results in decreased primary CD4+ and CD8+ human T-cell functional responses , 2011, Proceedings of the National Academy of Sciences.

[32]  Kristel Van Steen,et al.  Model-Based Multifactor Dimensionality Reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data. , 2011 .