Abundant contribution of short tandem repeats to gene expression variation in humans

The contribution of repetitive elements to quantitative human traits is largely unknown. Here we report a genome-wide survey of the contribution of short tandem repeats (STRs), which constitute one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from that of linked SNPs and indels and found that eSTRs contribute 10–15% of the cis heritability mediated by all common variants. Further functional genomic analyses showed that eSTRs are enriched in conserved regions, colocalize with regulatory elements and may modulate certain histone modifications. By analyzing known genome-wide association study (GWAS) signals and searching for new associations in 1,685 whole genomes from deeply phenotyped individuals, we found that eSTRs are enriched in various clinically relevant conditions. These results highlight the contribution of STRs to the genetic architecture of quantitative human traits.

[1]  Mario Falchi,et al.  Genome-wide Association Study Identifies Genes for Biomarkers of Cardiovascular Disease: Serum Urate and Dyslipidemia , 2022 .

[2]  Yaniv Erlich,et al.  The landscape of human STR variation , 2014, bioRxiv.

[3]  William A. Richardson,et al.  SLC2A9 is a newly identified urate transporter influencing serum urate concentration, urate excretion and gout , 2008, Nature Genetics.

[4]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[5]  Manolis Kellis,et al.  Interpreting non-coding variation in complex disease genetics , 2012, Nature Biotechnology.

[6]  Matt Jones,et al.  Linkage disequilibrium between single nucleotide polymorphisms and hypermutable loci , 2015, bioRxiv.

[7]  J. Weber,et al.  Linkage disequilibrium between STRPs and SNPs across the human genome. , 2008, American journal of human genetics.

[8]  James L. Weber,et al.  7 Genotyping for human whole-genome scans: Past, present, and future , 2001 .

[9]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[10]  P. Sullivan,et al.  Heritability and Genomics of Gene Expression in Peripheral Blood , 2014, Nature Genetics.

[11]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[12]  Judy H. Cho,et al.  Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease , 2008, Nature Genetics.

[13]  Elliott Kieff,et al.  Genetic Analysis of Human Traits In Vitro: Drug Response and Gene Expression in Lymphoblastoid Cell Lines , 2008, PLoS genetics.

[14]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[15]  S. Rosset,et al.  lobSTR: A short tandem repeat profiler for personal genomes , 2012, RECOMB.

[16]  D. Hood,et al.  Microsatellite instability regulates transcription factor binding and gene expression. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Michael A. Black,et al.  Microsatellite Tandem Repeats Are Abundant in Human Promoters and Are Associated with Regulatory Elements , 2013, PloS one.

[18]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[19]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[20]  David K. Gifford,et al.  GERV: A Statistical Method for Generative Evaluation of Regulatory Variants for Transcription Factor Binding , 2015, bioRxiv.

[21]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[22]  J. Penney,et al.  Trinucleotide repeat length instability and age of onset in Huntington's disease , 1993, Nature Genetics.

[23]  K. Wise,et al.  Molecular basis of Mycoplasma surface antigenic variation: a novel set of divergent genes undergo spontaneous mutation of periodic coding regions and 5′ regulatory sequences. , 1991, The EMBO journal.

[24]  R. Redon,et al.  Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes , 2007, Science.

[25]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[26]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[27]  Gerald Stampfel,et al.  Dissection of thousands of cell type-specific enhancers identifies dinucleotide repeat motifs as general enhancer features , 2014, Genome research.

[28]  C. Queitsch,et al.  The overdue promise of short tandem repeat variation for heritability , 2014, bioRxiv.

[29]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[30]  Tom R. Gaunt,et al.  The UK10K project identifies rare variants in health and disease , 2016 .

[31]  David Haussler,et al.  The UCSC Genome Browser database: 2014 update , 2013, Nucleic Acids Res..

[32]  John P. Rice,et al.  Genotyping for human whole-genome scans: past, present, and future. , 2001, Advances in genetics.

[33]  John P. Overington,et al.  An atlas of genetic influences on human blood metabolites , 2014, Nature Genetics.

[34]  J. Weber,et al.  Mutation of human short tandem repeats. , 1993, Human molecular genetics.

[35]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[36]  E. Dermitzakis,et al.  Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations , 2010, PLoS genetics.

[37]  Matthieu Legendre,et al.  Variable tandem repeats accelerate evolution of coding and regulatory sequences. , 2010, Annual review of genetics.

[38]  C. Gieger,et al.  SLC2A9 influences uric acid concentrations with pronounced sex-specific effects , 2008, Nature Genetics.

[39]  D. Housman,et al.  Aberrant splicing of HTT generates the pathogenic exon 1 protein in Huntington disease , 2013, Proceedings of the National Academy of Sciences.

[40]  S. Mirkin Expandable DNA repeats and human disease , 2007, Nature.

[41]  Haoyang Zeng,et al.  Whole genome regulatory variant evaluation for transcription factor binding by Haoyang Zeng , 2015 .

[42]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[43]  C. Patterson,et al.  Genotyping and functional analysis of a polymorphic (CCTTT)n repeat of NOS2A in diabetic retinopathy , 1999, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[44]  A. Rich,et al.  A polymorphic dinucleotide repeat in the rat nucleolin gene forms Z-DNA and inhibits promoter activity , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Mark Gerstein,et al.  The origin, evolution, and functional impact of short insertion–deletion variants identified in 179 human genomes , 2013, Genome research.

[46]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[47]  M. Robinson,et al.  Tandem repeat variation in human and great ape populations and its impact on gene expression divergence , 2015, bioRxiv.

[48]  H. Ellegren Microsatellites: simple sequences with complex evolution , 2004, Nature Reviews Genetics.

[49]  Gonçalo R. Abecasis,et al.  Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma , 2007, Nature.

[50]  Naomi R. Wray,et al.  Regulatory variants explain much more heritability than coding variants across 11 common diseases , 2014, bioRxiv.

[51]  Albrecht Bindereif,et al.  HnRNP L stimulates splicing of the eNOS gene by binding to variable-length CA repeats , 2003, Nature Structural Biology.

[52]  Matthew E. Ritchie,et al.  A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data , 2009, Nucleic acids research.

[53]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[54]  Judith Roth,et al.  A polymorphic microsatellite that mediates induction of PIG3 by p53 , 2002, Nature Genetics.

[55]  C. Gieger,et al.  A Systematic Evaluation of Short Tandem Repeats in Lipid Candidate Genes: Riding on the SNP-Wave , 2014, PloS one.

[56]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[57]  Naomi R. Wray,et al.  Haplotypes of common SNPs can explain missing heritability of complex diseases , 2015, bioRxiv.

[58]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[59]  C. L. Warner,et al.  Meiotic stability and genotype – phenotype correlation of the trinucleotide repeat in X–linked spinal and bulbar muscular atrophy , 1992, Nature Genetics.

[60]  Matthieu Legendre,et al.  Unstable Tandem Repeats in Promoters Confer Transcriptional Evolvability , 2009, Science.

[61]  Larry J Young,et al.  Microsatellite Instability Generates Diversity in Brain and Sociobehavioral Traits , 2005, Science.

[62]  Y. Sasaguri,et al.  Shortened microsatellite d(CA)21 sequence down‐regulates promoter activity of matrix metalloproteinase 9 gene , 1999, FEBS letters.

[63]  K. Zänker,et al.  Modulation of Epidermal Growth Factor Receptor Gene Transcription by a Polymorphic Dinucleotide Repeat in Intron 1* , 1999, The Journal of Biological Chemistry.

[64]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[65]  Buhm Han,et al.  Disentangling effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex trait loci , 2014 .

[66]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[67]  A. Sharp,et al.  Rapid Multiplexed Genotyping of Simple Tandem Repeats using Capture and High‐Throughput Sequencing , 2013, Human Mutation.

[68]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[69]  G. Wray,et al.  Abundant raw material for cis-regulatory evolution in humans. , 2002, Molecular biology and evolution.

[70]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[71]  Simon C. Potter,et al.  Mapping cis- and trans-regulatory effects across multiple tissues in twins , 2012, Nature Genetics.

[72]  G. Cutting,et al.  A variable dinucleotide repeat in the CFTR gene contributes to phenotype diversity by forming RNA secondary structures that alter splicing. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[73]  Monika Heiner,et al.  Intronic CA‐repeat and CA‐rich elements: a new class of regulators of mammalian alternative splicing , 2005, The EMBO journal.

[74]  M. Todesco,et al.  A Genetic Defect Caused by a Triplet Repeat Expansion in Arabidopsis thaliana , 2009, Science.

[75]  F. Mooi,et al.  Fimbrial phase variation in Bordetella pertussis: a novel mechanism for transcriptional regulation. , 1990, The EMBO journal.

[76]  S. Chanock,et al.  Chimeric EWSR1-FLI1 regulates the Ewing sarcoma susceptibility gene EGR2 via a GGAA microsatellite , 2015, Nature Genetics.

[77]  Matthew Stephens,et al.  Dissecting the regulatory architecture of gene expression QTLs , 2012, Genome Biology.

[78]  E. Dermitzakis,et al.  Tandem repeat sequence variation as causative Cis‐eQTLs for protein‐coding gene expression variation: The case of CSTB , 2012, Human mutation.

[79]  J. Ioannidis Why Most Discovered True Associations Are Inflated , 2008, Epidemiology.

[80]  E. Moxon,et al.  The molecular mechanism of phase variation of H. influenzae lipopolysaccharide , 1989, Cell.