ParaHaplo 2.0: a program package for haplotype-estimation and haplotype-based whole-genome association study using parallel computing

BackgroundThe use of haplotype-based association tests can improve the power of genome-wide association studies. Since the observed genotypes are unordered pairs of alleles, haplotype phase must be inferred. However, estimating haplotype phase is time consuming. When millions of single-nucleotide polymorphisms (SNPs) are analyzed in genome-wide association study, faster methods for haplotype estimation are required.MethodsWe developed a program package for parallel computation of haplotype estimation. Our program package, ParaHaplo 2.0, is intended for use in workstation clusters using the Intel Message Passing Interface (MPI). We compared the performance of our algorithm to that of the regular permutation test on both Japanese in Tokyo, Japan and Han Chinese in Beijing, China of the HapMap dataset.ResultsParallel version of ParaHaplo 2.0 can estimate haplotypes 100 times faster than a non-parallel version of the ParaHaplo.ConclusionParaHaplo 2.0 is an invaluable tool for conducting haplotype-based genome-wide association studies (GWAS). The need for fast haplotype estimation using parallel computing will become increasingly important as the data sizes of such projects continue to increase. The executable binaries and program sources of ParaHaplo are available at the following address: http://en.sourceforge.jp/projects/parallelgwas/releases/

[1]  Hiroshi Sato,et al.  Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction , 2002, Nature Genetics.

[2]  Yusuke Nakamura,et al.  Genome-wide association study of hematological and biochemical traits in a Japanese population , 2010, Nature Genetics.

[3]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[4]  Yusuke Nakamura,et al.  ITPKC functional polymorphism associated with Kawasaki disease susceptibility and formation of coronary artery aneurysms , 2008, Nature Genetics.

[5]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[6]  Ron Shamir,et al.  GERBIL: Genotype resolution and block identification using likelihood. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Yusuke Nakamura,et al.  An intronic SNP in a RUNX1 binding site of SLC22A4, encoding an organic cation transporter, is associated with rheumatoid arthritis , 2003, Nature Genetics.

[8]  P. Tam The International HapMap Consortium. The International HapMap Project (Co-PI of Hong Kong Centre which responsible for 2.5% of genome) , 2003 .

[9]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[10]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[11]  Masao Yanagisawa,et al.  New correction algorithms for multiple comparisons in case-control multilocus association studies based on haplotypes and diplotype configurations , 2008, Journal of Human Genetics.

[12]  Yusuke Nakamura,et al.  [BioBank Japan project]. , 2005, Nihon rinsho. Japanese journal of clinical medicine.

[13]  D. Schaid Evaluating associations of haplotypes with traits , 2004, Genetic epidemiology.

[14]  Naoyuki Kamatani,et al.  ParaHaplo: A program package for haplotype-based whole-genome association study using parallel computing , 2009, Source Code for Biology and Medicine.

[15]  Andrey V. Mardanov,et al.  Complete Sequence of the Duckweed (Lemna minor) Chloroplast Genome: Structural Organization and Phylogenetic Relationships to Other Angiosperms , 2008, Journal of Molecular Evolution.

[16]  Sharon R. Browning,et al.  Missing data imputation and haplotype phase inference for genome-wide association studies , 2008, Human Genetics.

[17]  R. Shamir,et al.  A fast method for computing high-significance disease association in large population-based studies. , 2006, American journal of human genetics.

[18]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[19]  Zhaohui S. Qin,et al.  A comparison of phasing algorithms for trios and unrelated individuals. , 2006, American journal of human genetics.

[20]  B. Browning,et al.  Efficient multilocus association testing for whole genome association studies using localized haplotype clustering , 2007, Genetic epidemiology.

[21]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.