ParaHaplo: A program package for haplotype-based whole-genome association study using parallel computing

BackgroundSince more than a million single-nucleotide polymorphisms (SNPs) are analyzed in any given genome-wide association study (GWAS), performing multiple comparisons can be problematic. To cope with multiple-comparison problems in GWAS, haplotype-based algorithms were developed to correct for multiple comparisons at multiple SNP loci in linkage disequilibrium. A permutation test can also control problems inherent in multiple testing; however, both the calculation of exact probability and the execution of permutation tests are time-consuming. Faster methods for calculating exact probabilities and executing permutation tests are required.MethodsWe developed a set of computer programs for the parallel computation of accurate P-values in haplotype-based GWAS. Our program, ParaHaplo, is intended for workstation clusters using the Intel Message Passing Interface (MPI). We compared the performance of our algorithm to that of the regular permutation test on JPT and CHB of HapMap.ResultsParaHaplo can detect smaller differences between 2 populations than SNP-based GWAS. We also found that parallel-computing techniques made ParaHaplo 100-fold faster than a non-parallel version of the program.ConclusionParaHaplo is a useful tool in conducting haplotype-based GWAS. Since the data sizes of such projects continue to increase, the use of fast computations with parallel computing--such as that used in ParaHaplo--will become increasingly important. The executable binaries and program sources of ParaHaplo are available at the following address: http://sourceforge.jp/projects/parallelgwas/?_sl=1

[1]  Yusuke Nakamura,et al.  [BioBank Japan project]. , 2005, Nihon rinsho. Japanese journal of clinical medicine.

[2]  P. Gregersen,et al.  Accounting for ancestry: population substructure and genome-wide association studies. , 2008, Human molecular genetics.

[3]  S. Gabriel,et al.  Assessing the impact of population stratification on genetic association studies , 2004, Nature Genetics.

[4]  Yusuke Nakamura,et al.  ITPKC functional polymorphism associated with Kawasaki disease susceptibility and formation of coronary artery aneurysms , 2008, Nature Genetics.

[5]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[6]  R. Shamir,et al.  A fast method for computing high-significance disease association in large population-based studies. , 2006, American journal of human genetics.

[7]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[8]  Katsushi Tokunaga,et al.  The power of genome-wide association studies of complex disease genes: statistical limitations of indirect approaches using SNP markers , 2001, Journal of Human Genetics.

[9]  Tetsuya Hori,et al.  The CENP-H–I complex is required for the efficient incorporation of newly synthesized CENP-A into centromeres , 2006, Nature Cell Biology.

[10]  Yusuke Nakamura,et al.  An intronic SNP in a RUNX1 binding site of SLC22A4, encoding an organic cation transporter, is associated with rheumatoid arthritis , 2003, Nature Genetics.

[11]  J. Bryan,et al.  Cloning of cDNAs encoding human caldesmons. , 1992, Gene.

[12]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[13]  M. Kimura Evolutionary Rate at the Molecular Level , 1968, Nature.

[14]  Masao Yanagisawa,et al.  New correction algorithms for multiple comparisons in case-control multilocus association studies based on haplotypes and diplotype configurations , 2008, Journal of Human Genetics.

[15]  John Edward Terrell,et al.  Environmental setting of human migrations in the circum‐Pacific region , 2007 .

[16]  Tomoko Izaki,et al.  Two forms of human Inscuteable-related protein that links Par3 to the Pins homologues LGN and AGS3. , 2006, Biochemical and biophysical research communications.

[17]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[18]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[19]  Thomas C. Südhof,et al.  Ca2+-dependent and -independent activities of neural and non-neural synaptotagmins , 1995, Nature.

[20]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[21]  B. Trapp,et al.  NMDA receptors mediate calcium accumulation in myelin during chemical ischaemia , 2006, Nature.

[22]  Kenshi Hayashi,et al.  D-HaploDB: a database of definitive haplotypes determined by genotyping complete hydatidiform mole samples , 2006, Nucleic Acids Res..

[23]  Michael F. Seldin,et al.  Analysis of East Asia Genetic Substructure Using Genome-Wide SNP Arrays , 2008, PloS one.

[24]  Hiroshi Sato,et al.  Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction , 2002, Nature Genetics.

[25]  Yusuke Nakamura,et al.  Japanese population structure, based on SNP genotypes from 7003 individuals compared to other ethnic groups: effects on population-based association studies. , 2008, American journal of human genetics.