An efficient parallel algorithm for haplotype inference based on rule based approach and consensus methods.

With recent advancements in bioinformatics technology, completion of ‘Human Genome Project’, and the exponential growth in genetic data, there is an ever-growing need in genomics to determine the relevance of genetic variations with observed phenotypes and complex diseases. The most common form of genetic variation is a single base mutation called SNP (Single Nucleotide Polymorphism), resulting in different alleles present at a given locus. Grouped together, a set of closely linked SNPs on a particular copy of chromosome is defined as a ‘Haplotype’. Research has proven that analysis of haplotypes is potentially more promising and insightful, and hence, at the forefront of bioinformatics investigations, especially due to its significance in the complex disease association studies. Current routine genotyping methods typically do not provide haplotype information, essential for many analyses of fine-scale molecular-genetics data. Biological methods for Haplotype Inference are cost prohibitive and labor intensive. Hence continues the search for more accurate computational methods for haplotype determination from abundantly and inexpensively available ambiguous genotype data. Continuing the search for a more efficient algorithm, we present in this work two parallel algorithmic approaches, based on ‘Inference Rule’ introduced by Clark in 1990 and the consensus method reported by Dr. Steven Hecht Orzack in 2003. One approach parallelizes the consensus method. As although, consensus method produces results comparable to other leading haplotype inference algorithms, its time efficiency can be significantly improved to further investigate this promising method for haplotype inference with larger data sets and greater number of iterations of Clark’s algorithm. The parallel algorithm is also used to study the affect of different number of iterations used for consensus method. Second parallel approach introduces an Enhanced Consensus algorithm that improves upon the average accuracy achieved by consensus method in much smaller time interval.

[1]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[2]  F. Collins,et al.  The Human Genome Project: Lessons from Large-Scale Biology , 2003, Science.

[3]  Lusheng Wang,et al.  Haplotype inference by maximum parsimony , 2003, Bioinform..

[4]  Randall A. Bolanos,et al.  Whole-genome shotgun assembly and comparison of human genome assemblies , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Xin Lu,et al.  Haplotype Inference and Its Application in Linkage Disequilibrium Mapping , 2002, Computational Methods for SNPs and Haplotype Inference.

[6]  Dan Gusfield,et al.  A Practical Algorithm for Optimal Inference of Haplotypes from Diploid Populations , 2000, ISMB.

[7]  Paola Bonizzoni,et al.  The Haplotyping problem: An overview of computational models and solutions , 2003, Journal of Computer Science and Technology.

[8]  Tao Jiang,et al.  Minimum Recombinant Haplotype Configuration on Tree Pedigrees ( Extended Abstract ) , 2003 .

[9]  Xu Peng,et al.  BMC Bioinformatics BioMed Central Methodology article SNP haplotype tagging from DNA pools of two individuals , 2002 .

[10]  Dan Gusfield,et al.  An Overview of Combinatorial Methods for Haplotype Inference , 2002, Computational Methods for SNPs and Haplotype Inference.

[11]  Tao Jiang,et al.  Efficient Inference of Haplotypes from Genotypes on a Pedigree , 2003, J. Bioinform. Comput. Biol..

[12]  R. Fuerst,et al.  Haplotyping and estimation of haplotype frequencies for closely linked biallelic multilocus genetic phenotypes including nuclear family information , 2001, Human mutation.

[13]  A. Russell,et al.  The Minimum k-Colored Subgraph Problem in Haplotyping and DNA Primer Selection , 2004 .

[14]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[15]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[16]  J. Ott,et al.  Efficiency of single-nucleotide polymorphism haplotype estimation from pooled DNA , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Shibu Yooseph,et al.  Haplotyping as Perfect Phylogeny: A Direct Approach , 2003, J. Comput. Biol..

[18]  Jacques S. Beckmann,et al.  Resolution of haplotypes and haplotype frequencies from SNP genotypes of pooled samples , 2003, RECOMB '03.

[19]  Jong Hyun Kim,et al.  Haplotype Reconstruction from SNP Alignment , 2004, J. Comput. Biol..

[20]  James D. Watson,et al.  The Double Helix: A Personal Account of the Discovery of the Structure of DNA , 1968 .

[21]  Tao Jiang,et al.  Efficient rule-based haplotyping algorithms for pedigree data , 2003, RECOMB '03.

[22]  Richard M. Karp,et al.  Perfect phylogeny and haplotype assignment , 2004, RECOMB '04.

[23]  Dan Gusfield,et al.  Haplotype Inference by Pure Parsimony , 2003, CPM.

[24]  Shibu Yooseph,et al.  Combinatorial Problems Arising in SNP and Haplotype Analysis , 2003, DMTCS.

[25]  Richard M. Karp,et al.  Large scale reconstruction of haplotypes from genotype data , 2003, RECOMB '03.

[26]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[27]  N. Schork,et al.  Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. , 2000, American journal of human genetics.

[28]  C. Mitter,et al.  One Long Argument: Charles Darwin and the Genesis of Modern Evolutionary Thought , 1994 .

[29]  R. Adkins,et al.  Comparison of the accuracy of methods of computational haplotype inference using a large empirical dataset , 2004, BMC Genetics.

[30]  Shibu Yooseph,et al.  A Survey of Computational Methods for Determining Haplotypes , 2002, Computational Methods for SNPs and Haplotype Inference.

[31]  E. Eskin,et al.  Optimally Phasing Long Genomic Regions using Local Haplotype Predictions , 2008 .

[32]  Dan Gusfield,et al.  Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms , 2001, J. Comput. Biol..

[33]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[34]  Giuseppe Lancia,et al.  Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem , 2002, WABI.

[35]  Tao Jiang,et al.  PedPhase : Haplotype Inference for Pedigree Data , 2003 .

[36]  Ron Shamir,et al.  Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs , 2004, CPM.

[37]  A. Chakravarti,et al.  Haplotype inference in random population samples. , 2002, American journal of human genetics.

[38]  T. Niu Algorithms for inferring haplotypes , 2004, Genetic epidemiology.