CChi: An efficient cloud epistasis test model in human genome wide association studies

Due to the vast amounts of SNPs and huge search space, how to decrease the total computation costs is a challenge in genome wide association studies (GWAS). Triggered by this problem, we develop an effective and efficient algorithm for epistasis detection in GWAS. We propose a cloud-based algorithm using chi-square test, denoted as CChi. CChi adopts a pruning strategy by utilizing an upper bound to prune amounts of unnecessary SNP pairs, and is implemented under Google's MapReduce framework. A best-fit model is proposed by us to distribute SNP pairs to each reducer. Extensive experimental results demonstrate that CChi is practically and computationally efficient.

[1]  Qiang Yang,et al.  MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study , 2009, BMC Bioinformatics.

[2]  Divyakant Agrawal,et al.  CEO a cloud epistasis computing model in GWAS , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[3]  Xiang Zhang,et al.  COE: A General Approach for Efficient Genome-Wide Two-Locus Epistasis Test in Disease Association Study , 2010, J. Comput. Biol..

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Xiang Zhang,et al.  Fastanova: an efficient algorithm for genome-wide association study , 2008, KDD.

[6]  Eran Halperin,et al.  Tag SNP selection in genotype data for maximizing SNP prediction accuracy , 2005, ISMB.

[7]  F. Hu,et al.  A Common Genetic Variant Is Associated with Adult and Childhood Obesity , 2006, Science.

[8]  Y. Ohnishi,et al.  Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction , 2003, Nature Genetics.

[9]  C. Carlson,et al.  Mapping complex disease loci in whole-genome association studies , 2004, Nature.

[10]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[11]  G. Church,et al.  Modular epistasis in yeast metabolism , 2005, Nature Genetics.

[12]  Paola Sebastiani,et al.  Minimal haplotype tagging , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[14]  Fred A. Wright,et al.  Genetics and population analysis Simulating association studies : a data-based resampling method for candidate regions or whole genome scans , 2007 .

[15]  Allen D. Roses,et al.  The genome era begins... , 2003, Nature Genetics.

[16]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[17]  Qiang Yang,et al.  Predictive rule inference for epistatic interaction detection in genome-wide association studies , 2010, Bioinform..