Best Practices and Joint Calling of the HumanExome BeadChip: The CHARGE Consortium

Genotyping arrays are a cost effective approach when typing previously-identified genetic polymorphisms in large numbers of samples. One limitation of genotyping arrays with rare variants (e.g., minor allele frequency [MAF] <0.01) is the difficulty that automated clustering algorithms have to accurately detect and assign genotype calls. Combining intensity data from large numbers of samples may increase the ability to accurately call the genotypes of rare variants. Approximately 62,000 ethnically diverse samples from eleven Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium cohorts were genotyped with the Illumina HumanExome BeadChip across seven genotyping centers. The raw data files for the samples were assembled into a single project for joint calling. To assess the quality of the joint calling, concordance of genotypes in a subset of individuals having both exome chip and exome sequence data was analyzed. After exclusion of low performing SNPs on the exome chip and non-overlap of SNPs derived from sequence data, genotypes of 185,119 variants (11,356 were monomorphic) were compared in 530 individuals that had whole exome sequence data. A total of 98,113,070 pairs of genotypes were tested and 99.77% were concordant, 0.14% had missing data, and 0.09% were discordant. We report that joint calling allows the ability to accurately genotype rare variation using array technology when large sample sizes are available and best practices are followed. The cluster file from this experiment is available at www.chargeconsortium.com/main/exomechip.

[1]  E. Boerwinkle,et al.  dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions , 2011, Human mutation.

[2]  V. Gudnason,et al.  Common founder mutation in the LDL receptor gene causing familial hypercholesterolaemia in the Icelandic population , 1997, Human mutation.

[3]  Monique M. B. Breteler,et al.  The Rotterdam Study: 2016 objectives and design update , 2015, European Journal of Epidemiology.

[4]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[5]  Ruijie Liu,et al.  Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips , 2011, BMC Bioinformatics.

[6]  Patrick F. Sullivan,et al.  zCall: a rare variant caller for array-based genotyping: Genetics and population analysis , 2012, Bioinform..

[7]  M. Arfan Ikram,et al.  The Rotterdam Study: 2012 objectives and design update , 2011, European journal of epidemiology.

[8]  Daniel W. Jones,et al.  Toward resolution of cardiovascular health disparities in African Americans: design and methods of the Jackson Heart Study. , 2005, Ethnicity & disease.

[9]  S B Hulley,et al.  CARDIA: study design, recruitment, and some characteristics of the examined subjects. , 1988, Journal of clinical epidemiology.

[10]  A. Hofman,et al.  The Rotterdam Study: objectives and design update , 2007, European Journal of Epidemiology.

[11]  A. Newman,et al.  Decreased Muscle Strength and Quality in Older Adults With Type 2 Diabetes , 2006, Diabetes.

[12]  V. Gudnason,et al.  Age, Gene/Environment Susceptibility-Reykjavik Study: multidisciplinary applied phenomics. , 2007, American journal of epidemiology.

[13]  J. Eyfjörd,et al.  A single BRCA2 mutation in male and female breast cancer families from Iceland with varied cancer phenotypes , 1996, Nature Genetics.

[14]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[15]  A. Dyer,et al.  Cardiovascular risk factors in young adults. The CARDIA baseline monograph. , 1991, Controlled clinical trials.

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  A. Folsom,et al.  The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. , 1989, American journal of epidemiology.

[18]  T. Dawber,et al.  Epidemiological approaches to heart disease: the Framingham Study. , 1951, American journal of public health and the nation's health.

[19]  A. Hofman,et al.  Determinants of disease and disability in the elderly: The Rotterdam elderly study , 1991, European Journal of Epidemiology.

[20]  K. Lunetta,et al.  Methods in Genetics and Clinical Interpretation Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium Design of Prospective Meta-Analyses of Genome-Wide Association Studies From 5 Cohorts , 2010 .

[21]  R. Kronmal,et al.  The Cardiovascular Health Study: design and rationale. , 1991, Annals of epidemiology.

[22]  D C Rao,et al.  NHLBI Family Heart Study: objectives and design. , 1996, American journal of epidemiology.

[23]  William H. Press,et al.  Numerical recipes: the art of scientific computing, 3rd Edition , 2007 .

[24]  R. Kronmal,et al.  Multi-Ethnic Study of Atherosclerosis: objectives and design. , 2002, American journal of epidemiology.

[25]  P. Mills Efficient statistical classification of satellite measurements , 2011, 1202.2194.

[26]  L. Fried,et al.  Recruitment of adults 65 years and older as participants in the Cardiovascular Health Study. , 1993, Annals of epidemiology.

[27]  K. Nakashima,et al.  [The Rotterdam study]. , 2011, Nihon rinsho. Japanese journal of clinical medicine.

[28]  T. Raghunathan,et al.  Dietary intake and cell membrane levels of long-chain n-3 polyunsaturated fatty acids and the risk of primary cardiac arrest. , 1995, JAMA.