Phenome-wide association study (PheWAS) in EMR-linked pediatric cohorts, genetically links PLCL1 to speech language development and IL5-IL13 to Eosinophilic Esophagitis

Objective: We report the first pediatric specific Phenome-Wide Association Study (PheWAS) using electronic medical records (EMRs). Given the early success of PheWAS in adult populations, we investigated the feasibility of this approach in pediatric cohorts in which associations between a previously known genetic variant and a wide range of clinical or physiological traits were evaluated. Although computationally intensive, this approach has potential to reveal disease mechanistic relationships between a variant and a network of phenotypes. Method: Data on 5049 samples of European ancestry were obtained from the EMRs of two large academic centers in five different genotyped cohorts. Recently, these samples have undergone whole genome imputation. After standard quality controls, removing missing data and outliers based on principal components analyses (PCA), 4268 samples were used for the PheWAS study. We scanned for associations between 2476 single-nucleotide polymorphisms (SNP) with available genotyping data from previously published GWAS studies and 539 EMR-derived phenotypes. The false discovery rate was calculated and, for any new PheWAS findings, a permutation approach (with up to 1,000,000 trials) was implemented. Results: This PheWAS found a variety of common variants (MAF > 10%) with prior GWAS associations in our pediatric cohorts including Juvenile Rheumatoid Arthritis (JRA), Asthma, Autism and Pervasive Developmental Disorder (PDD) and Type 1 Diabetes with a false discovery rate < 0.05 and power of study above 80%. In addition, several new PheWAS findings were identified including a cluster of association near the NDFIP1 gene for mental retardation (best SNP rs10057309, p = 4.33 × 10−7, OR = 1.70, 95%CI = 1.38 − 2.09); association near PLCL1 gene for developmental delays and speech disorder [best SNP rs1595825, p = 1.13 × 10−8, OR = 0.65(0.57 − 0.76)]; a cluster of associations in the IL5-IL13 region with Eosinophilic Esophagitis (EoE) [best at rs12653750, p = 3.03 × 10−9, OR = 1.73 95%CI = (1.44 − 2.07)], previously implicated in asthma, allergy, and eosinophilia; and association of variants in GCKR and JAZF1 with allergic rhinitis in our pediatric cohorts [best SNP rs780093, p = 2.18 × 10−5, OR = 1.39, 95%CI = (1.19 − 1.61)], previously demonstrated in metabolic disease and diabetes in adults. Conclusion: The PheWAS approach with re-mapping ICD-9 structured codes for our European-origin pediatric cohorts, as with the previous adult studies, finds many previously reported associations as well as presents the discovery of associations with potentially important clinical implications.

[1]  J. Kang,et al.  Susceptibility influence of a PTPN22 haplotype with thyroid autoimmunity in Koreans , 2011, Diabetes/metabolism research and reviews.

[2]  J. Corren Inhibition of interleukin-5 for the treatment of eosinophilic diseases. , 2012, Discovery medicine.

[3]  K. Dewar,et al.  Allele-specific chromatin remodeling in the ZPBP2/GSDMB/ORMDL3 locus associated with the risk of asthma and autoimmune disease. , 2009, American journal of human genetics.

[4]  R. A. Bailey,et al.  Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes , 2007, Nature Genetics.

[5]  P. Zheng,et al.  SLC22A4 and SLC22A5 gene polymorphisms and Crohn's disease in the Chinese Han population , 2009, Journal of digestive diseases.

[6]  Johannes-Peter Haas,et al.  Genome-wide association analysis of juvenile idiopathic arthritis identifies a new susceptibility locus at chromosomal region 3q13. , 2012, Arthritis and rheumatism.

[7]  Ryan D. Hernandez,et al.  Meta-analysis of Genome-wide Association Studies of Asthma In Ethnically Diverse North American Populations , 2011, Nature Genetics.

[8]  Keith Marsolo,et al.  EMR-linked GWAS study: investigation of variation landscape of loci for body mass index in children , 2013, Front. Genet..

[9]  J. Klein,et al.  Adoption of Body Mass Index Guidelines for Screening and Counseling In Pediatric Practice , 2010, Pediatrics.

[10]  Helen Schuilenburg,et al.  Genome-wide association study and meta-analysis finds over 40 loci affect risk of type 1 diabetes , 2009, Nature Genetics.

[11]  Peter Szolovits,et al.  Association between low density lipoprotein and rheumatoid arthritis genetic factors with low density lipoprotein levels in rheumatoid arthritis and non-rheumatoid arthritis controls , 2013, Annals of the rheumatic diseases.

[12]  E. Boerwinkle,et al.  Association of rs780094 in GCKR with Metabolic Traits and Incident Diabetes and Cardiovascular Disease: The ARIC Study , 2010, PloS one.

[13]  Marylyn D. Ritchie,et al.  Imputation and quality control steps for combining multiple genome-wide datasets , 2014, Front. Genet..

[14]  D. Pinto,et al.  Rare deletions at the neurexin 3 locus in autism spectrum disorder. , 2012, American journal of human genetics.

[15]  J. Denny,et al.  Phenome-Wide Association Study , 2016 .

[16]  J. Kere,et al.  Association analysis of common variants of STAT6, GATA3, and STAT4 to asthma and high serum IgE phenotypes. , 2005, The Journal of allergy and clinical immunology.

[17]  Joseph T. Glessner,et al.  Common variants at 5q22 associate with pediatric eosinophilic esophagitis , 2010, Nature Genetics.

[18]  John P A Ioannidis,et al.  Meta-analysis in genome-wide association studies. , 2009, Pharmacogenomics.

[19]  Patrice Degoulet,et al.  Phenome-Wide Association Studies on a Quantitative Trait: Application to TPMT Enzyme Activity and Thiopurine Therapy in Pharmacogenomics , 2013, PLoS Comput. Biol..

[20]  Melissa A. Basford,et al.  Genome- and Phenome-Wide Analyses of Cardiac Conduction Identifies Markers of Arrhythmia Risk , 2013, Circulation.

[21]  Don L. Armstrong,et al.  High-density genotyping of STAT4 reveals multiple haplotypic associations with systemic lupus erythematosus in different racial groups. , 2009, Arthritis and rheumatism.

[22]  J. Marchini,et al.  Genotype Imputation with Thousands of Genomes , 2011, G3: Genes | Genomes | Genetics.

[23]  J. Yamada,et al.  Dysfunction of Extrasynaptic GABAergic Transmission in Phospholipase C-Related, but Catalytically Inactive Protein 1 Knockout Mice Is Associated with an Epilepsy Phenotype , 2012, Journal of Pharmacology and Experimental Therapeutics.

[24]  Tariq Ahmad,et al.  Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci , 2010, Nature Genetics.

[25]  E. Groot,et al.  An IL-13 promoter polymorphism associated with increased risk of allergic asthma , 1999, Genes and Immunity.

[26]  M. Gratacós,et al.  Association of Neurexin 3 polymorphisms with smoking behavior , 2012, Genes, brain, and behavior.

[27]  C. Gieger,et al.  Human metabolic individuality in biomedical and pharmaceutical research , 2011, Nature.

[28]  David S Sanders,et al.  Newly identified genetic risk variants for celiac disease related to the immune response , 2008, Nature Genetics.

[29]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[30]  Yi Li,et al.  Interleukin-23 receptor genetic polymorphisms and Crohn’s disease susceptibility: a meta-analysis , 2010, Inflammation Research.

[31]  Peter Almgren,et al.  Mechanisms by which common variants in the TCF7L2 gene increase risk of type 2 diabetes. , 2007, The Journal of clinical investigation.

[32]  J. Britto,et al.  Ndfip1 is required for the development of pyramidal neuron dendrites and spines in the neocortex. , 2014, Cerebral cortex.

[33]  T. Spector,et al.  Identification of PLCL1 Gene for Hip Bone Size Variation in Females in a Genome-Wide Association Study , 2008, PloS one.

[34]  S. Gidding The rationale for lowering serum cholesterol levels in American children. , 1993, American journal of diseases of children.

[35]  C. Gieger,et al.  IL12A, MPHOSPH9/CDK2AP1 and RGS1 are novel multiple sclerosis susceptibility loci , 2010, Genes and Immunity.

[36]  Springer Basel Ag Interleukin-23 receptor genetic polymorphisms and Crohn's disease susceptibility: a meta-analysis , 2010 .

[37]  D. Clayton,et al.  Genome-wide association study and meta-analysis finds over 40 loci affect risk of type 1 diabetes , 2009, Nature Genetics.

[38]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[39]  J. Sutcliffe,et al.  Variation in ITGB3 is associated with whole-blood serotonin level and autism susceptibility , 2006, European Journal of Human Genetics.

[40]  Chuong B. Do,et al.  A genome-wide association meta-analysis of self-reported allergy identifies shared and allergy-specific susceptibility loci , 2013, Nature Genetics.

[41]  Marylyn D. Ritchie,et al.  Phenome-Wide Association Study (PheWAS) for Detection of Pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network , 2013, PLoS genetics.

[42]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[43]  K. Clément,et al.  Deficiency and pharmacological stabilization of mast cells reduce diet-induced obesity and diabetes in mice , 2009, Nature Medicine.

[44]  K. Kohara,et al.  The GCKR rs780094 polymorphism is associated with susceptibility of type 2 diabetes, reduced fasting plasma glucose levels, increased triglycerides levels and lower HOMA-IR in Japanese population , 2010, Journal of Human Genetics.

[45]  Andrew M. Rupert,et al.  Genome-wide association analysis of eosinophilic esophagitis provides insight into the tissue specificity of this allergic disease , 2014, Nature Genetics.

[46]  D. Postma,et al.  Interleukin 13, CD14, pet and tobacco smoke influence atopy in three Dutch cohorts: the allergenic study , 2008, European Respiratory Journal.

[47]  E. Cook Autism: Review of neurochemical investigation , 1990, Synapse.

[48]  Wj Gauderman,et al.  QUANTO 1.1: A computer program for power and sample size calculations for genetic-epidemiology studies , 2006 .

[49]  Christian Gieger,et al.  Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture , 2013, Nature Genetics.

[50]  Melissa A. Basford,et al.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data , 2013, Nature Biotechnology.

[51]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[52]  N J Cox,et al.  Evidence of linkage between the serotonin transporter and autistic disorder , 1997, Molecular Psychiatry.

[53]  Christian Gieger,et al.  A genome-wide association study of plasma total IgE concentrations in the Framingham Heart Study. , 2012, The Journal of allergy and clinical immunology.

[54]  J. Karsh,et al.  The Association Between Allergy and Diabetes in the Canadian Population: Implications for the Th1-Th2 Hypothesis , 2005, European Journal of Epidemiology.

[55]  C Kooperberg,et al.  The use of phenome‐wide association studies (PheWAS) for exploration of novel genotype‐phenotype relationships and pleiotropy discovery , 2011, Genetic epidemiology.

[56]  T. Bourgeron,et al.  Linkage and association of the glutamate receptor 6 gene with autism , 2002, Molecular Psychiatry.

[57]  A. Barski,et al.  IL-33 Markedly Activates Murine Eosinophils by an NF-κB–Dependent Mechanism Differentially Dependent upon an IL-4–Driven Autoinflammatory Loop , 2013, The Journal of Immunology.

[58]  David M. Evans,et al.  Variability in the common genetic architecture of social-communication spectrum phenotypes during childhood and adolescence , 2014, Molecular Autism.

[59]  T. Biederer,et al.  Expression and adhesion profiles of SynCAM molecules indicate distinct neuronal functions , 2008, The Journal of comparative neurology.

[60]  Melissa A. Basford,et al.  Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. , 2011, American journal of human genetics.

[61]  Winkelmann,et al.  IL 12 A , MPHOSPH 9 / CDK 2 AP 1 and RGS 1 are novel multiple sclerosis susceptibility loci , 2010 .

[62]  Anbupalam Thalamuthu,et al.  TRAF1-C5 as a risk locus for rheumatoid arthritis--a genomewide study. , 2007, The New England journal of medicine.

[63]  I. Sayers,et al.  Defining the contribution of SNPs identified in asthma GWAS to clinical variables in asthmatic children , 2013, BMC Medical Genetics.

[64]  Joshua C. Denny,et al.  R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment , 2014, Bioinform..

[65]  Benjamin M Neale,et al.  Meta-analysis of genome-wide association studies. , 2010, Cold Spring Harbor protocols.

[66]  L. Peltonen,et al.  Genome-wide association study identifies 12 new susceptibility loci for primary biliary cirrhosis , 2011, Nature Genetics.

[67]  Michael Boehnke,et al.  LocusZoom: regional visualization of genome-wide association scan results , 2010, Bioinform..

[68]  T. Spector,et al.  Identification of PLCL 1 Gene for Hip Bone Size Variation in Females in a Genome-Wide Association Study , 2008 .

[69]  Margaret A. Pericak-Vance,et al.  A genome-wide scan for common alleles affecting risk for autism , 2010, Human molecular genetics.

[70]  E. Cook,et al.  Genome-wide association study identifies ITGB3 as a QTL for whole blood serotonin , 2004, European Journal of Human Genetics.

[71]  Sampath Prahalad,et al.  Dense genotyping of immune-related disease regions identifies 14 new susceptibility loci for juvenile idiopathic arthritis , 2013, Nature Genetics.

[72]  Christian Gieger,et al.  META-ANALYSIS OF GENOME-WIDE ASSOCIATION STUDIES IDENTIFIES THREE NEW RISK LOCI FOR ATOPIC DERMATITIS , 2011, Nature Genetics.

[73]  R. Kantero,et al.  III. Comparison of Height and Weight Distance Curves Based on Longitudinal and Cross‐Sectional Series from Birth to 10 Years , 1971 .

[74]  J. Sinsheimer,et al.  TCF7L2 is associated with high serum triacylglycerol and differentially expressed in adipose tissue in families with familial combined hyperlipidaemia , 2007, Diabetologia.

[75]  G. Davies,et al.  Associations among types of impulsivity, substance use problems and neurexin-3 polymorphisms. , 2011, Drug and alcohol dependence.