PLINK: a tool set for whole-genome association and population-based linkage analyses.

Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

[1]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[2]  W. Grove Statistical Methods for Rates and Proportions, 2nd ed , 1981 .

[3]  E. Lander,et al.  Construction of multilocus genetic linkage maps in humans. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[4]  J. Besag,et al.  Sequential Monte Carlo p-values , 1991 .

[5]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[6]  Nelson B. Freimer,et al.  Genome screening by searching for shared segments: mapping a gene for benign recurrent intrahepatic cholestasis , 1994, Nature Genetics.

[7]  E. Lander,et al.  Genetic dissection of complex traits science , 1994 .

[8]  R. Doerge,et al.  Empirical threshold values for quantitative trait mapping. , 1994, Genetics.

[9]  W. Ewens,et al.  The transmission/disequilibrium test: history, subdivision, and admixture. , 1995, American journal of human genetics.

[10]  L. Sandkuijl,et al.  Perspectives of identity by descent (IBD) mapping in founder populations , 1995, Clinical and experimental allergy : journal of the British Society for Allergy and Clinical Immunology.

[11]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[12]  R. Doerge,et al.  Permutation tests for multiple loci affecting a quantitative character. , 1996, Genetics.

[13]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[14]  W. Ewens,et al.  A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. , 1998, American journal of human genetics.

[15]  J. Weber,et al.  Long homozygous chromosomal segments in reference families from the centre d'Etude du polymorphisme humain. , 1999, American journal of human genetics.

[16]  J K Hewitt,et al.  Combined linkage and association sib-pair analysis for quantitative traits. , 1999, American journal of human genetics.

[17]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[18]  G. Abecasis,et al.  A general test of association for quantitative traits in nuclear families. , 2000, American journal of human genetics.

[19]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[20]  J. Pritchard Are rare variants responsible for susceptibility to complex diseases? , 2001, American journal of human genetics.

[21]  E. Lander,et al.  On the allelic spectrum of human disease. , 2001, Trends in genetics : TIG.

[22]  J. Ott,et al.  Trimming, weighting, and grouping SNPs in human case-control association studies. , 2001, Genome research.

[23]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[24]  B. Milligan,et al.  Maximum-likelihood estimation of relatedness. , 2003, Genetics.

[25]  Wen-Chung Lee Detecting population stratification using a panel of single nucleotide polymorphisms. , 2003, International journal of epidemiology.

[26]  S. Gabriel,et al.  Assessing the impact of population stratification on genetic association studies , 2004, Nature Genetics.

[27]  Winnie S. Liang,et al.  Mapping of sudden infant death with dysgenesis of the testes syndrome (SIDDT) by a SNP genome scan and identification of TSPYL loss of function. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Pak Sham,et al.  Properties of Structured Association Approaches to Detecting Population Stratification , 2005, Human Heredity.

[29]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[30]  D. Clayton,et al.  Population structure, differential bias and genomic control in a large-scale, case-control association study , 2005, Nature Genetics.

[31]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.

[32]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[33]  J. Chang-Claude,et al.  Haplotype Sharing Analysis Using Mantel Statistics , 2005, Human Heredity.

[34]  G. Abecasis,et al.  A note on exact tests of Hardy-Weinberg equilibrium. , 2005, American journal of human genetics.

[35]  Pak Sham,et al.  Parental phenotypes in family-based association analysis. , 2005, American journal of human genetics.

[36]  M. Daly,et al.  Evaluating and improving power in whole-genome association studies using fixed marker sets , 2006, Nature Genetics.

[37]  Thomas A Trikalinos,et al.  Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. , 2006, American journal of epidemiology.

[38]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[39]  Sandrine Dudoit,et al.  A fine-scale linkage-disequilibrium measure based on length of haplotype sharing. , 2006, American journal of human genetics.

[40]  C. Carlson Agnosticism and equity in genome-wide association studies , 2006, Nature Genetics.

[41]  Pak Sham,et al.  Ascertainment Through Family History of Disease Often Decreases the Power of Family-based Association Studies , 2007, Behavior genetics.

[42]  Marcia M. Nizzari,et al.  Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels , 2007, Science.