Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder

INTRODUCTION The DNA of protein-coding genes is transcribed into mRNA, which is translated into proteins. The “coding genome” describes the DNA that contains the information to make these proteins and represents ~1.5% of the human genome. Newly arising de novo mutations (variants observed in a child but not in either parent) in the coding genome contribute to numerous childhood developmental disorders, including autism spectrum disorder (ASD). Discovery of these effects is aided by the triplet code that enables the functional impact of many mutations to be readily deciphered. In contrast, the “noncoding genome” covers the remaining ~98.5% and includes elements that regulate when, where, and to what degree protein-coding genes are transcribed. Understanding this noncoding sequence could provide insights into human disorders and refined control of emerging genetic therapies. Yet little is known about the role of mutations in noncoding regions, including whether they contribute to childhood developmental disorders, which noncoding elements are most vulnerable to disruption, and the manner in which information is encoded in the noncoding genome. RATIONALE Whole-genome sequencing (WGS) provides the opportunity to identify the majority of genetic variation in each individual. By performing WGS on 1902 quartet families including a child affected with ASD, one unaffected sibling control, and their parents, we identified ~67 de novo mutations across each child’s genome. To characterize the functional role of these mutations, we integrated multiple datasets relating to gene function, genes implicated in neurodevelopmental disorders, conservation across species, and epigenetic markers, thereby combinatorially defining 55,143 categories. The scope of the problem—testing for an excess of de novo mutations in cases relative to controls for each category—is challenging because there are more categories than families. RESULTS Comparing cases to controls, we observed an excess of de novo mutations in cases in individual categories in the coding genome but not in the noncoding genome. To overcome the challenge of detecting noncoding association, we used machine learning tools to develop a de novo risk score to look for an excess of de novo mutations across multiple categories. This score demonstrated a contribution to ASD risk from coding mutations and a weaker, but significant, contribution from noncoding mutations. This noncoding signal was driven by mutations in the promoter region, defined as the 2000 nucleotides upstream of the transcription start site (TSS) where mRNA synthesis starts. The strongest promoter signals were defined by conservation across species and transcription factor binding sites. Well-defined promoter elements (e.g., TATA-box) are usually observed within 80 nucleotides of the TSS; however, the strongest ASD association was observed distally, 750 to 2000 nucleotides upstream of the TSS. CONCLUSION We conclude that de novo mutations in the noncoding genome contribute to ASD. The clearest evidence of noncoding ASD association came from mutations at evolutionarily conserved nucleotides in the promoter region. The enrichment for transcription factor binding sites, primarily in the distal promoter, suggests that these mutations may disrupt gene transcription via their interaction with enhancer elements in the promoter region, rather than interfering with transcriptional initiation directly. Promoter regions in autism. De novo mutations from 1902 quartet families are assigned to 55,143 annotation categories, which are each assessed for autism spectrum disorder (ASD) association by comparing mutation counts in cases and sibling controls. A de novo risk score demonstrated a noncoding contribution to ASD driven by promoter mutations, especially at sites conserved across species, in the distal promoter or targeted by transcription factors. Whole-genome sequencing (WGS) has facilitated the first genome-wide evaluations of the contribution of de novo noncoding mutations to complex disorders. Using WGS, we identified 255,106 de novo mutations among sample genomes from members of 1902 quartet families in which one child, but not a sibling or their parents, was affected by autism spectrum disorder (ASD). In contrast to coding mutations, no noncoding functional annotation category, analyzed in isolation, was significantly associated with ASD. Casting noncoding variation in the context of a de novo risk score across multiple annotation categories, however, did demonstrate association with mutations localized to promoter regions. We found that the strongest driver of this promoter signal emanates from evolutionarily conserved transcription factor binding sites distal to the transcription start site. These data suggest that de novo mutations in promoter regions, characterized by evolutionary and functional signatures, contribute to ASD.

[1]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[2]  Michael T. McManus,et al.  A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity , 2016, bioRxiv.

[3]  D. Licatalosi,et al.  FMRP Stalls Ribosomal Translocation on mRNAs Linked to Synaptic Function and Autism , 2011, Cell.

[4]  Stephan J Sanders,et al.  Refining the role of de novo protein truncating variants in neurodevelopmental disorders using population reference samples , 2016, Nature Genetics.

[5]  Christopher S. Poultney,et al.  Insights into Autism Spectrum Disorder Genomic Architecture and Biology from 71 Risk Loci , 2015, Neuron.

[6]  S. Steinberg,et al.  Rate of de novo mutations and the importance of father’s age to disease risk , 2012, Nature.

[7]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[8]  Y. S. Kim,et al.  Prevalence of autism spectrum disorders in a total population sample. , 2011, The American journal of psychiatry.

[9]  B. Yamrom,et al.  De novo indels within introns contribute to ASD incidence , 2017, bioRxiv.

[10]  Wei Chen,et al.  A Bayesian framework for de novo mutation calling in parents-offspring trios , 2015, Bioinform..

[11]  Samuel S. Gross,et al.  Genome-wide characteristics of de novo mutations in autism , 2016, npj Genomic Medicine.

[12]  Christopher S. Poultney,et al.  Synaptic, transcriptional, and chromatin genes disrupted in autism , 2014, Nature.

[13]  F. Dudbridge,et al.  Estimation of significance thresholds for genomewide association scans , 2008, Genetic epidemiology.

[14]  Lilia M. Iakoucheva,et al.  Paternally inherited cis-regulatory structural variants are associated with autism , 2018, Science.

[15]  Deciphering Developmental Disorders Study,et al.  Prevalence and architecture of de novo mutations in developmental disorders , 2017, Nature.

[16]  Lilia M. Iakoucheva,et al.  Whole-Genome Sequencing in Autism Identifies Hot Spots for De Novo Germline Mutation , 2012, Cell.

[17]  Manoj Kumar,et al.  AVPpred: collection and prediction of highly effective antiviral peptides , 2012, Nucleic Acids Res..

[18]  Wei Niu,et al.  Coexpression Networks Implicate Human Midfetal Deep Cortical Projection Neurons in the Pathogenesis of Autism , 2013, Cell.

[19]  Yufeng Shen,et al.  Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands , 2017, Nature Genetics.

[20]  C. Baker,et al.  Genome Sequencing of Autism-Affected Families Reveals Disruption of Putative Noncoding Regulatory DNA. , 2016, American journal of human genetics.

[21]  M. DePristo,et al.  Variation in genome-wide mutation rates within and between human families , 2011, Nature Genetics.

[22]  P. Maclean Mirror Display in the Squirrel Monkey, Saimiri sciureus , 1964, Science.

[23]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[24]  Kathryn Roeder,et al.  DAWN: a framework to identify autism genes and subnetworks using gene expression and genetics , 2014, Molecular Autism.

[25]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[26]  Alejandro Sifrim,et al.  Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data , 2015, The Lancet.

[27]  W. B. Hemsley The Flora of the Revillagigedo Islands , 1891, Nature.

[28]  C. Lord,et al.  The Simons Simplex Collection: A Resource for Identification of Autism Genetic Risk Factors , 2010, Neuron.

[29]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[30]  Edwin Cuppen,et al.  The genomic landscape of balanced cytogenetic abnormalities associated with human congenital anomalies , 2016, Nature Genetics.

[31]  S. Grant,et al.  Characterization of the proteome, diseases and evolution of the human postsynaptic density , 2011, Nature Neuroscience.

[32]  J. Kleinman,et al.  Spatiotemporal transcriptome of the human brain , 2011, Nature.

[33]  Leighton J. Core,et al.  Precise Maps of RNA Polymerase Reveal How Promoters Direct Initiation and Pausing , 2013, Science.

[34]  Nilanjan Chatterjee,et al.  Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies , 2013, Nature Genetics.

[35]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[36]  Hailiang Huang,et al.  Whole genome sequencing in psychiatric disorders: the WGSPD consortium , 2017, bioRxiv.

[37]  Roy Ben-Shalom,et al.  Opposing Effects on NaV1.2 Function Underlie Differences Between SCN2A Variants Observed in Individuals With Autism Spectrum Disorder or Infantile Seizures , 2017, Biological Psychiatry.

[38]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[39]  Jing Wang,et al.  CrossMap: a versatile tool for coordinate conversion between genome assemblies , 2014, Bioinform..

[40]  Joan,et al.  Prevalence and architecture of de novo mutations in developmental disorders , 2017, Nature.

[41]  Len A. Pennacchio,et al.  Genomic Patterns of De Novo Mutation in Simplex Autism , 2017, Cell.

[42]  Caroline F. Wright,et al.  De novo mutations in regulatory elements in neurodevelopmental disorders , 2018, Nature.

[43]  Allan R. Jones,et al.  Transcriptional Landscape of the Prenatal Human Brain , 2014, Nature.

[44]  Ryan L. Collins,et al.  An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder , 2018, Nature Genetics.

[45]  Stephan J Sanders,et al.  The autism-associated chromatin modifier CHD8 regulates other autism risk genes during human neurodevelopment , 2015, Nature Communications.

[46]  Arthur Wuster,et al.  Timing, rates and spectra of human germline mutation , 2015, Nature Genetics.

[47]  Eric S. Lander,et al.  A polygenic burden of rare disruptive mutations in schizophrenia , 2014, Nature.

[48]  Michael J. Purcaro,et al.  The PsychENCODE project , 2015, Nature Neuroscience.

[49]  J. Lupski,et al.  Non-coding genetic variants in human disease. , 2015, Human molecular genetics.

[50]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[51]  Manolis Kellis,et al.  CHD8 regulates neurodevelopmental pathways associated with autism spectrum disorder in neural progenitors , 2014, Proceedings of the National Academy of Sciences.

[52]  Hannes P. Eggertsson,et al.  Parental influence on human germline de novo mutations in 1,548 trios from Iceland , 2017, Nature.

[53]  B. Faircloth,et al.  Primer3—new capabilities and interfaces , 2012, Nucleic acids research.

[54]  M. Daly,et al.  De novo variants in neurodevelopmental disorders with epilepsy , 2018, Nature Genetics.