Searching SNP Combinations Related to Evolutionary Information of Human Populations on HapMap Data

The International HapMap Project is a partnership of scientists and funding agencies from different countries to develop a public resource that will help researchers find genes associated with human disease and response to pharmaceuticals. The project has collected large amounts of SNP(single-nucleotide polymorphism) data of individuals of different human populations. Many researchers have revealed evolution information from the SNP data. But how to find all the SNPs related to human evolution is still a hard work. At most time, these SNPs work together which leads to the differences between different human populations. The number of SNP combinations is very large, thus it is impossible to check all the combinations. In this paper, a novel algorithm is proposed to find the SNP combinatorial patterns whose frequencies are quite different in two different populations. The numbers of the multi-SNP combinations are regarded as the differences between each paired human populations, then a hierarchical clustering algorithm is used to construct the evolution trees for human populations. The trees from 4 chromosomes are consistent and the result can be validated by other literatures, which indicates that evolutionary information is well mined. The multi-SNP combinations found by our method can be studied further in many aspects.

[1]  Swapan Mallick,et al.  Ancient Admixture in Human History , 2012, Genetics.

[2]  Alex Zelikovsky,et al.  Combinatorial Methods for Disease Association Search and Susceptibility Prediction , 2006, WABI.

[3]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[4]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[5]  Wei Zhang,et al.  FstSNP-HapMap3: a database of SNPs with high population differentiation for HapMap3 , 2008, Bioinformation.

[6]  Yu Zhang,et al.  A novel bayesian graphical model for genome‐wide multi‐SNP association mapping , 2012, Genetic epidemiology.

[7]  R. Goebel,et al.  Efficient haplotype inference algorithms in one whole genome scan for pedigree data with non-genotyped founders , 2009 .

[8]  Randy Goebel,et al.  Whole genome Identity-by-Descent determination , 2013, J. Bioinform. Comput. Biol..

[9]  Li-Yeh Chuang,et al.  Analysis of SNP Interaction Combinations to Determine Breast Cancer Risk with PSO , 2011, 2011 IEEE 11th International Conference on Bioinformatics and Bioengineering.

[10]  Dumitru Brinza,et al.  Discrete algorithms for analysis of genotype data , 2007 .

[11]  Analabha Basu,et al.  Haplotype variation in the ACE gene in global populations, with special reference to India, and an alternative model of evolution of haplotypes , 2011, The HUGO Journal.

[12]  Yun-Xin Fu,et al.  Significantly fewer protein functional changing variants for lipid metabolism in Africans than in Europeans , 2012, Journal of Translational Medicine.

[13]  Weiping Wang,et al.  Mining protein complexes from PPI networks using the minimum vertex cut , 2012 .

[14]  Eugene Seneta,et al.  Carlo Emilio Bonferroni , 2001 .

[15]  Lusheng Wang,et al.  Fast accurate missing SNP genotype local imputation , 2012, BMC Research Notes.

[16]  Lucie M. Gattepaille,et al.  Inferring population size changes with sequence and SNP data: lessons from human bottlenecks , 2013, Heredity.

[17]  Weidong Mao,et al.  A Combinatorial Analysis of Genetic Data for Crohn's Disease , 2007 .

[18]  William Amos,et al.  Even small SNP clusters are non-randomly distributed: is this evidence of mutational non-independence? , 2010, Proceedings of the Royal Society B: Biological Sciences.

[19]  Paul Stothard,et al.  Most parsimonious haplotype allele sharing determination , 2009, BMC Bioinformatics.