HapBoost: A Fast Approach to Boosting Haplotype Association Analyses in Genome-Wide Association Studies

Genome-wide association study (GWAS) has been successful in identifying genetic variants that are associated with complex human diseases. In GWAS, multilocus association analyses through linkage disequilibrium (LD), named haplotype-based analyses, may have greater power than single-locus analyses for detecting disease susceptibility loci. However, the large number of SNPs genotyped in GWAS poses great computational challenges in the detection of haplotype associations. We present a fast method named HapBoost for finding haplotype associations, which can be applied to quickly screen the whole genome. The effectiveness of HapBoost is demonstrated by using both synthetic and real data sets. The experimental results show that the proposed approach can achieve comparably accurate results while it performs much faster than existing methods.

[1]  Andrew P Morris,et al.  A flexible Bayesian framework for modeling haplotype association with disease, allowing for dominance effects of the underlying causative variants. , 2006, American journal of human genetics.

[2]  D. Y. Lin,et al.  An efficient Monte Carlo approach to assessing statistical significance in genomic studies , 2005, Bioinform..

[3]  Tao Jiang,et al.  Genetics and population analysis Haplotype-based linkage disequilibrium mapping via direct data mining , 2005 .

[4]  Dan J Stein,et al.  Psychosis and relapse in bipolar disorder are related to GRM3, DAOA, and GRIN2B genotype. , 2010, African journal of psychiatry.

[5]  R F Woolson,et al.  Statistical analysis of K 2 x 2 tables: a comparative study of estimators/test statistics for association and homogeneity. , 1990, Environmental health perspectives.

[6]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[7]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  Daniel O. Stram,et al.  Modeling and E-M Estimation of Haplotype-Specific Relative Risks from Genotype Data for a Case-Control Study of Unrelated Individuals , 2003, Human Heredity.

[10]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[11]  A. Clark,et al.  The role of haplotypes in candidate gene studies , 2004, Genetic epidemiology.

[12]  Benjamin Yakir,et al.  Comprar The Statistics of Gene Mapping | Siegmund, David | 9780387496849 | Springer , 2007 .

[13]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[14]  D. Schaid,et al.  Score tests for association between traits and haplotypes when linkage phase is ambiguous. , 2002, American journal of human genetics.

[15]  Judy H Cho,et al.  Improved risk prediction for Crohn's disease with a multi-locus approach. , 2011, Human molecular genetics.

[16]  Tze-Yun Leong,et al.  Efficient mining of haplotype patterns for linkage disequilibrium mapping. , 2010, Journal of bioinformatics and computational biology.

[17]  G. Satten,et al.  Inference on haplotype effects in case-control studies using unphased genotype data. , 2003, American journal of human genetics.

[18]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[19]  Tom Druet,et al.  A Hidden Markov Model Combining Linkage and Linkage Disequilibrium Information for Haplotype Reconstruction and Quantitative Trait Locus Fine Mapping , 2010, Genetics.

[20]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[21]  B. Browning,et al.  Haplotype phasing: existing methods and new developments , 2011, Nature Reviews Genetics.

[22]  B. Doble,et al.  Glycogen synthase kinase 3, circadian rhythms, and bipolar disorder: a molecular link in the therapeutic action of lithium , 2007, Journal of circadian rhythms.

[23]  Juliea Morris,et al.  Statistics in Medicine: Calculating confidence intervals for relative risks (odds ratios) and standardised ratios and rates , 1988, British medical journal.

[24]  Hannu Toivonen,et al.  HaploRec: efficient and accurate large-scale reconstruction of haplotypes , 2006, BMC Bioinformatics.

[25]  Benjamin J. Wright,et al.  Genome-wide haplotype association study identifies the SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease , 2009, Nature Genetics.

[26]  Xianggui Qu,et al.  The Statistics of Gene Mapping , 2008, Technometrics.

[27]  Lusheng Wang,et al.  Fast accurate missing SNP genotype local imputation , 2012, BMC Research Notes.

[28]  G L Snyder,et al.  Paullones are potent inhibitors of glycogen synthase kinase-3beta and cyclin-dependent kinase 5/p25. , 2000, European journal of biochemistry.

[29]  Lue Ping Zhao,et al.  A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case-control studies. , 2003, American journal of human genetics.

[30]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.