FlashPCA2: principal component analysis of biobank-scale genotype datasets

Motivation Principal component analysis (PCA) is a crucial step in quality control of genomic data and a common approach for understanding population genetic structure. With the advent of large genotyping studies involving hundreds of thousands of individuals, standard approaches are no longer computationally feasible. We present FlashPCA2, a tool that can perform PCA on 1 million individuals faster than competing approaches, while requiring substantially less memory. Availability https://github.com/gabraham/ashpca Contact gad.abraham@unimelb.edu.au

[1]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[2]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[3]  F. De Filippis,et al.  A Selected Core Microbiome Drives the Early Stages of Three Popular Italian Cheese Manufactures , 2014, PloS one.

[4]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[5]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[6]  Gad Abraham,et al.  Fast Principal Component Analysis of Large-Scale Genome-Wide Data , 2014, bioRxiv.

[7]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[8]  M. Stephens,et al.  Interpreting principal component analyses of spatial population genetic variation , 2008, Nature Genetics.

[9]  D. Mozaffarian,et al.  Changes in Intake of Fruits and Vegetables and Weight Change in United States Men and Women Followed for Up to 24 Years: Analysis from Three Prospective Cohort Studies , 2015, PLoS medicine.

[10]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[11]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[12]  Sayan Mukherjee,et al.  Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia. , 2016, American journal of human genetics.