Fine-Scale Variation and Genetic Determinants of Alternative Splicing across Individuals

Recently, thanks to the increasing throughput of new technologies, we have begun to explore the full extent of alternative pre–mRNA splicing (AS) in the human transcriptome. This is unveiling a vast layer of complexity in isoform-level expression differences between individuals. We used previously published splicing sensitive microarray data from lymphoblastoid cell lines to conduct an in-depth analysis on splicing efficiency of known and predicted exons. By combining publicly available AS annotation with a novel algorithm designed to search for AS, we show that many real AS events can be detected within the usually unexploited, speculative majority of the array and at significance levels much below standard multiple-testing thresholds, demonstrating that the extent of cis-regulated differential splicing between individuals is potentially far greater than previously reported. Specifically, many genes show subtle but significant genetically controlled differences in splice-site usage. PCR validation shows that 42 out of 58 (72%) candidate gene regions undergo detectable AS, amounting to the largest scale validation of isoform eQTLs to date. Targeted sequencing revealed a likely causative SNP in most validated cases. In all 17 incidences where a SNP affected a splice-site region, in silico splice-site strength modeling correctly predicted the direction of the micro-array and PCR results. In 13 other cases, we identified likely causative SNPs disrupting predicted splicing enhancers. Using Fst and REHH analysis, we uncovered significant evidence that 2 putative causative SNPs have undergone recent positive selection. We verified the effect of five SNPs using in vivo minigene assays. This study shows that splicing differences between individuals, including quantitative differences in isoform ratios, are frequent in human populations and that causative SNPs can be identified using in silico predictions. Several cases affected disease-relevant genes and it is likely some of these differences are involved in phenotypic diversity and susceptibility to complex diseases.

[1]  N. Siva 1000 Genomes project , 2008, Nature Biotechnology.

[2]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[3]  Martin S. Taylor,et al.  Identification of Common Genetic Variation That Modulates Alternative Splicing , 2007, PLoS genetics.

[4]  L. Tsui,et al.  Identification of the cystic fibrosis gene: genetic analysis. , 1989, Science.

[5]  J. Venables,et al.  Multiple alternative splicing markers for ovarian cancer. , 2008, Cancer research.

[6]  J. Stephenson 1000 Genomes Project , 2008 .

[7]  Namshin Kim,et al.  The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species , 2006, Nucleic Acids Res..

[8]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[9]  Francesco Piva,et al.  SpliceAid: a database of experimental RNA target motifs bound by splicing proteins in humans , 2009, Bioinform..

[10]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[11]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[12]  David Haussler,et al.  The UCSC Known Genes , 2006, Bioinform..

[13]  T. Cooper,et al.  Minigene reporter for identification and analysis of cis elements and trans factors affecting pre-mRNA splicing. , 2006, BioTechniques.

[14]  S. Berget Exon Recognition in Vertebrate Splicing (*) , 1995, The Journal of Biological Chemistry.

[15]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[16]  Douglas L Black,et al.  Polypyrimidine tract binding protein controls the transition from exon definition to an intron defined spliceosome , 2008, Nature Structural &Molecular Biology.

[17]  S. Hunt,et al.  Genome-Wide Associations of Gene Expression Variation in Humans , 2005, PLoS genetics.

[18]  T. Hudson,et al.  Identification of the gene responsible for the cblB complementation group of vitamin B12-dependent methylmalonic aciduria. , 2002, Human molecular genetics.

[19]  Pardis C Sabeti,et al.  Detecting recent positive selection in the human genome from haplotype structure , 2002, Nature.

[20]  D. Nickerson,et al.  PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. , 1997, Nucleic acids research.

[21]  C. Phillips,et al.  The role of α-methylacyl-CoA racemase in bile acid synthesis , 2002 .

[22]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[23]  Michael Krawczak,et al.  Systematic evaluation of the effect of common SNPs on pre‐mRNA splicing , 2009, Human mutation.

[24]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[25]  Joshua T. Burdick,et al.  Mapping determinants of human gene expression by regional and genome-wide association , 2005, Nature.

[26]  J. Královičová,et al.  Global control of aberrant splice-site activation by auxiliary splicing sequences: evidence for a gradient in exon and intron definition , 2007, Nucleic acids research.

[27]  Jacek Majewski,et al.  Gene Expression and Isoform Variation Analysis using Affymetrix Exon Arrays , 2008, BMC Genomics.

[28]  C. Phillips,et al.  The role of alpha-methylacyl-CoA racemase in bile acid synthesis. , 2002, The Biochemical journal.

[29]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[30]  Lijun He,et al.  Identification of common genetic variants that account for transcript isoform variation between human populations , 2008, Human Genetics.

[31]  J. Castle,et al.  Genome-Wide Survey of Human Alternative Pre-mRNA Splicing with Exon Junction Microarrays , 2003, Science.

[32]  H. Broxmeyer,et al.  Increased osteoclast development after estrogen loss: mediation by interleukin-6. , 1992, Science.

[33]  Harry Zuzan,et al.  Heritability of alternative splicing in the human genome. , 2007, Genome research.

[34]  Jinhua Wang,et al.  ESEfinder: a web resource to identify exonic splicing enhancers , 2003, Nucleic Acids Res..

[35]  R. Doerge,et al.  Global eQTL Mapping Reveals the Complex Genetic Architecture of Transcript-Level Variation in Arabidopsis , 2007, Genetics.

[36]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[37]  T. Hirano,et al.  AIDS Kaposi sarcoma-derived cells produce and respond to interleukin 6. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[38]  M. Beaumont,et al.  Evaluating loci for use in the genetic analysis of population structure , 1996, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[39]  M. Rubin,et al.  Alpha-Methylacyl-CoA Racemase: A Novel Tumor Marker Over-expressed in Several Human Cancers and Their Precursor Lesions , 2002, The American journal of surgical pathology.

[40]  Jacek Majewski,et al.  Effect of polymorphisms within probe–target sequences on olignonucleotide microarray experiments , 2008, Nucleic acids research.

[41]  Thomas J. Hudson,et al.  Cis-Acting Regulatory Variation in the Human Genome , 2004, Science.

[42]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[43]  R. Collins,et al.  Newly identified loci that influence lipid concentrations and risk of coronary artery disease , 2008, Nature Genetics.

[44]  D. Chuang,et al.  Impaired Assembly of E1 Decarboxylase of the Branched-chain α-Ketoacid Dehydrogenase Complex in Type IA Maple Syrup Urine Disease* , 1998, The Journal of Biological Chemistry.

[45]  Daniel J. Gaffney,et al.  Gene expression and isoform variation analysis using Affymetrix exon arrays , 2009, BMC Genomics.

[46]  A. Vander,et al.  Role of interleukin 6 in fever in rats. , 1990, The American journal of physiology.

[47]  John P. A. Ioannidis,et al.  Validating, augmenting and refining genome-wide association signals , 2009, Nature Reviews Genetics.

[48]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[49]  Jacek Majewski,et al.  Genome-wide analysis of transcript isoform variation in humans , 2008, Nature Genetics.

[50]  Michael Q. Zhang,et al.  An increased specificity score matrix for the prediction of SF2/ASF-specific exonic splicing enhancers. , 2006, Human molecular genetics.

[51]  S Rozen,et al.  Primer3 on the WWW for general users and for biologist programmers. , 2000, Methods in molecular biology.

[52]  J. Wise,et al.  Evidence for Splice Site Pairing via Intron Definition in Schizosaccharomyces pombe , 2000, Molecular and Cellular Biology.

[53]  Christopher B. Burge,et al.  Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals , 2003, RECOMB '03.

[54]  J. Kelso,et al.  Allele‐specific transcript isoforms in human , 2004, FEBS letters.