Summarizing Genome-wide Phased Genotypes using Phased PC Plots

Ordination in reduced space such as principal component (PC) analysis and their visual representation in PC plots may help to uncover important patterns among samples in highly dimensional data sets. When used with data sets obtained from genome-wide genotyping, they may show biologically relevant relationships among populations, such as population structure and admixture. Extending the PC analysis to genome-wide phased genotypes may help to reveal different levels of inbreeding between or within populations as well as to evaluate the quality of the haplotyping technique used. We have developed a method to perform PC analysis to a data set of genome-wide phased genotypes and to plot results keeping information about individuals. The method has been implemented in the computer program PCPhaser. To increase the method applicability and reduce development time, PCPhaser implements the method through the transformation of the input data set by segregating haplotypes and using software EIGENSOFT to perform PC analysis. Given this transformation, the proposed method can be applied through any other software able to perform PCA, although PCPhaser will be still required to draw the phased PC plots. PCPhaser is a linux-based software that can be downloaded from http://bios.ugr.es/PCPhaser.

[1]  P. Sebastiani,et al.  Robust Transmission/Disequilibrium Test for Incomplete Family Genotypes , 2004, Genetics.

[2]  Eduardo Barrientos,et al.  Analysis of genomic diversity in Mexican Mestizo populations to develop genomic medicine in Mexico , 2009, Proceedings of the National Academy of Sciences.

[3]  Peter Donnelly,et al.  Assessing population differentiation and isolation from single‐nucleotide polymorphism data , 2002 .

[4]  M. Hurles,et al.  High-throughput haplotype determination over long distances by haplotype fusion PCR and ligation haplotyping , 2009, Nature Protocols.

[5]  Christian Gieger,et al.  Correlation between Genetic and Geographic Structure in Europe , 2008, Current Biology.

[6]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[7]  L. Pariset,et al.  Use of microsatellites for genetic variation and inbreeding analysis in Sarda sheep flocks of central Italy , 2003 .

[8]  T Jombart,et al.  Genetic markers in the playground of multivariate analysis , 2009, Heredity.

[9]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[10]  A. Brisbin Linkage Analysis For Categorical Traits And Ancestry Assignment In Admixed Individuals , 2010 .

[11]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[12]  O. Delaneau,et al.  A linear complexity phasing method for thousands of genomes , 2011, Nature Methods.

[13]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[14]  Zachary A. Szpiech,et al.  Statistical Applications in Genetics and Molecular Biology Comparing Spatial Maps of Human Population-Genetic Variation Using Procrustes Analysis , 2011 .

[15]  Gudmundur A. Thorisson,et al.  The International HapMap Project Web site. , 2005, Genome research.

[16]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.