Sequence Analysis Based Adaptive Hierarchical Clustering Approach for Admixture Population Structure Inference

Population structure inference is an important problem in many areas of human genetics. However, it is very difficult to infer the structure of the admixture population. The traditional Bayesian methods are often time-consuming and may run into convergence problem. Thus, we propose a novel approach to rapidly infer the admixture population stratification on genotype data. The cost of inference can be reduced and the noises can be eliminated by feature selection step. The genetic distance between two individuals is calculated through a sequence analysis algorithm and the distance matrix is used in an adaptive hierarchical clustering algorithm to infer the population structure. Compared with the software based on Bayesian methods (e.g., STRUCTURE), our approach has more efficient computations and the obtained stratification of admixture population is more accurate.

[1]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[2]  M. Daly,et al.  Methods for high-density admixture mapping of disease genes. , 2004, American journal of human genetics.

[3]  D. Conrad,et al.  Using Population Mixtures to Optimize the Utility of Genomic Databases: Linkage Disequilibrium and Association Study Design in India , 2008, Annals of human genetics.

[4]  N. Risch Searching for genetic determinants in the new millennium , 2000, Nature.

[5]  Rongwei Fu,et al.  Bayesian models for the analysis of genetic structure when populations are correlated , 2005, Bioinform..

[6]  M. Stephens,et al.  Inferring weak population structure with the assistance of sample group information , 2009, Molecular ecology resources.

[7]  Jinliang Wang Maximum-likelihood estimation of admixture proportions from genetic data. , 2003, Genetics.

[8]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[9]  Xin Wang,et al.  Mutations in SIN4 and RGR1 Cause Constitutive Expression of MAL Structural Genes in Saccharomyces cerevisiae , 2004, Genetics.

[10]  Jukka Corander,et al.  BAPS 2: enhanced possibilities for the analysis of genetic population structure , 2004, Bioinform..

[11]  L. Cardon,et al.  Population stratification and spurious allelic association , 2003, The Lancet.

[12]  Chih Lee,et al.  PCA-based population structure inference with generic clustering algorithms , 2009, BMC Bioinformatics.

[13]  Xiaolin Zhu,et al.  Qualitative Semi‐Parametric Test for Genetic Associations in Case‐Control Designs Under Structured Populations , 2003, Annals of human genetics.

[14]  Jun Wang,et al.  CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data , 2009, BMC Bioinformatics.

[15]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[16]  Gonçalo R. Abecasis,et al.  GENOME: a rapid coalescent-based whole genome simulator , 2007, Bioinform..