Single Nucleotide Polymorphisms Can Create Alternative Polyadenylation Signals and Affect Gene Expression through Loss of MicroRNA-Regulation

Alternative polyadenylation (APA) can for example occur when a protein-coding gene has several polyadenylation (polyA) signals in its last exon, resulting in messenger RNAs (mRNAs) with different 3′ untranslated region (UTR) lengths. Different 3′UTR lengths can give different microRNA (miRNA) regulation such that shortened transcripts have increased expression. The APA process is part of human cells' natural regulatory processes, but APA also seems to play an important role in many human diseases. Although altered APA in disease can have many causes, we reasoned that mutations in DNA elements that are important for the polyA process, such as the polyA signal and the downstream GU-rich region, can be one important mechanism. To test this hypothesis, we identified single nucleotide polymorphisms (SNPs) that can create or disrupt APA signals (APA-SNPs). By using a data-integrative approach, we show that APA-SNPs can affect 3′UTR length, miRNA regulation, and mRNA expression—both between homozygote individuals and within heterozygote individuals. Furthermore, we show that a significant fraction of the alleles that cause APA are strongly and positively linked with alleles found by genome-wide studies to be associated with disease. Our results confirm that APA-SNPs can give altered gene regulation and that APA alleles that give shortened transcripts and increased gene expression can be important hereditary causes for disease.

[1]  Laurent F. Thomas,et al.  Inferring causative variants in microRNA target sites , 2011, Nucleic acids research.

[2]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[3]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[4]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[5]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[6]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[7]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[8]  R. Redon,et al.  Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes , 2007, Science.

[9]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[10]  R. Bertina,et al.  Polymorphism 10034C>T is located in a region regulating polyadenylation of FGG transcripts and influences the fibrinogen γ′/γA mRNA ratio , 2007 .

[11]  Bin Tian,et al.  PolyA_DB 2: mRNA polyadenylation sites in vertebrate genes , 2007, Nucleic Acids Res..

[12]  J. Manley,et al.  Mechanism and regulation of mRNA polyadenylation. , 1997, Genes & development.

[13]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[14]  M. Hentze,et al.  3′ end mRNA processing: molecular mechanisms and implications for health and disease , 2008, The EMBO journal.

[15]  Adrian Wiestner,et al.  Point mutations and genomic deletions in CCND1 create stable truncated cyclin D1 mRNAs that are associated with increased proliferation rate and shorter survival. , 2007, Blood.

[16]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[17]  Michael Recce,et al.  PolyA_DB: a database for mammalian mRNA polyadenylation , 2004, Nucleic Acids Res..

[18]  C. Sander,et al.  A Mammalian microRNA Expression Atlas Based on Small RNA Library Sequencing , 2007, Cell.

[19]  W. G. Hill,et al.  Linkage disequilibrium in finite populations , 1968, Theoretical and Applied Genetics.

[20]  Takaya Saito,et al.  A two-step site and mRNA-level model for predicting microRNA targets , 2010, BMC Bioinformatics.

[21]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[22]  Michael Q. Zhang,et al.  Using quality scores and longer reads improves accuracy of Solexa read mapping , 2008, BMC Bioinformatics.

[23]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[24]  C. Mayr,et al.  Widespread Shortening of 3′UTRs by Alternative Cleavage and Polyadenylation Activates Oncogenes in Cancer Cells , 2009, Cell.

[25]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[26]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[27]  Zhi John Lu,et al.  Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project , 2011, Nucleic acids research.

[28]  J. Graber,et al.  Global changes in processing of mRNA 3' untranslated regions characterize clinically distinct cancer subtypes. , 2009, Cancer research.

[29]  Bin Tian,et al.  A functional human Poly(A) site requires only a potent DSE and an A-rich upstream sequence , 2010, The EMBO journal.

[30]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[31]  Bin Tian,et al.  A large-scale analysis of mRNA polyadenylation of human and mouse genes , 2005, Nucleic acids research.

[32]  R. Knight,et al.  Regions and Fewer MicroRNA Target Sites Proliferating Cells Express mRNAs with Shortened 3 ' Untranslated , 2012 .

[33]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[34]  Hugues Sicotte,et al.  Genome-Wide Transcriptional Profiling Reveals MicroRNA-Correlated Genes and Biological Processes in Human Lymphoblastoid Cell Lines , 2009, PloS one.

[35]  William Ritchie,et al.  Differential Repression of Alternative Transcripts: A Screen for miRNA Targets , 2006, PLoS Comput. Biol..

[36]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[37]  D. Clayton,et al.  Genome-wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing , 2009, Human molecular genetics.