ORF Capture-Seq as a versatile method for targeted identification of full-length isoforms

Most human protein-coding genes are expressed as multiple isoforms. This in turn greatly expands the functional repertoire of the encoded proteome. While at least one reliable open reading frame (ORF) model has been assigned for every gene, the majority of alternative isoforms remains uncharacterized experimentally. This is primarily due to: i) vast differences of overall levels between different isoforms expressed from common genes, and ii) the difficulty of obtaining contiguous full-length ORF sequences. Here, we present ORF Capture-Seq (OCS), a flexible and cost-effective method that addresses both challenges for targeted full-length isoform sequencing applications using collections of cloned ORFs as probes. As proof-of-concept, we show that an OCS pipeline focused on genes coding for transcription factors increases isoform detection by an order of magnitude, compared to unenriched sample. In short, OCS enables rapid discovery of isoforms from custom-selected genes and will allow mapping of the full set of human isoforms at reasonable cost.

[1]  David Haussler,et al.  The UCSC Genome Browser database: 2019 update , 2018, Nucleic Acids Res..

[2]  Richard E. Green,et al.  Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA , 2018, Proceedings of the National Academy of Sciences.

[3]  Universal Alternative Splicing of Noncoding Exons. , 2018, Cell systems.

[4]  David A. Knowles,et al.  Annotation-free quantification of RNA splicing using LeafCutter , 2017, Nature Genetics.

[5]  Jonathan D. G. Jones,et al.  Comparative analysis of targeted long read sequencing approaches for characterization of a plant’s immune receptor repertoire , 2017, BMC Genomics.

[6]  Lennart Martens,et al.  1 SQANTI : extensive characterization of long read transcript sequences for quality control in 1 full-length transcriptome identification and quantification 2 3 , 2017 .

[7]  Jennifer Harrow,et al.  High-throughput annotation of full-length long noncoding RNAs with Capture Long-Read Sequencing , 2017, Nature Genetics.

[8]  Jonathan D. G. Jones,et al.  Targeted capture and sequencing of gene-sized DNA molecules. , 2016, BioTechniques.

[9]  Jonathan M. Mudge,et al.  The state of play in higher eukaryote gene annotation , 2016, Nature Reviews Genetics.

[10]  M. Ante,et al.  SIRVs: Spike-In RNA Variants as External Isoform Controls in RNA-Sequencing , 2016, bioRxiv.

[11]  Jonathan D. G. Jones,et al.  Accelerated cloning of a potato late blight–resistance gene using RenSeq and SMRT sequencing , 2016, Nature Biotechnology.

[12]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[13]  Dmitri D. Pervouchine,et al.  A benchmark for RNA-seq quantification pipelines , 2016, Genome Biology.

[14]  J. Carpten,et al.  Translating RNA sequencing into clinical diagnostics: opportunities and challenges , 2016, Nature Reviews Genetics.

[15]  M. Diekhans,et al.  The ORFeome Collaboration: a genome-scale human ORF-clone resource , 2016, Nature Methods.

[16]  Gloria M. Sheynkman,et al.  Widespread Expansion of Protein Interaction Capabilities by Alternative Splicing , 2016, Cell.

[17]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[18]  H. Gronemeyer,et al.  TARDIS, a targeted RNA directional sequencing method for rare RNA discovery , 2015, Nature Protocols.

[19]  Gkikas Magiorkinis,et al.  A novel method for the multiplexed target enrichment of MinION next generation sequencing libraries using PCR-generated baits , 2015, Nucleic acids research.

[20]  Xiandong Meng,et al.  Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing , 2015, PloS one.

[21]  T. Blauwkamp,et al.  Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events , 2015, Nature Biotechnology.

[22]  Anton J. Enright,et al.  Quantitative gene profiling of long noncoding RNAs with targeted RNA sequencing , 2015, Nature Methods.

[23]  Eric Boerwinkle,et al.  PacBio-LITS: a large-insert targeted sequencing method for characterization of human disease-associated chromosomal structural variations , 2015, BMC Genomics.

[24]  R. Kamenetsky,et al.  Integrated transcriptome catalogue and organ-specific profiling of gene expression in fertile garlic (Allium sativum L.) , 2015, BMC Genomics.

[25]  Ravi Goyal,et al.  Health and Human Rights in Eastern Myanmar after the Political Transition: A Population-Based Assessment Using Multistaged Household Cluster Sampling , 2014, PloS one.

[26]  R. Marmeisse,et al.  Solution Hybrid Selection Capture for the Recovery of Functional Full-Length Eukaryotic cDNAs From Complex Environmental Samples , 2014, DNA research : an international journal for rapid publication of reports on genes and genomes.

[27]  S. Rasmussen,et al.  Hybridization Capture Using Short PCR Products Enriches Small Genomes by Capturing Flanking Sequences (CapFlank) , 2014, PloS one.

[28]  A. Pizarro,et al.  Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data , 2014, bioRxiv.

[29]  Marcel E Dinger,et al.  Targeted sequencing for gene discovery and quantification using RNA CaptureSeq , 2014, Nature Protocols.

[30]  Jaak Simm,et al.  Evolution of Bacterial Consortia in Spontaneously Started Rye Sourdoughs during Two Months of Daily Propagation , 2014, PloS one.

[31]  Cesare Furlanello,et al.  A promoter-level mammalian expression atlas , 2015 .

[32]  M. Lovett,et al.  Multiplexed direct genomic selection (MDiGS): a pooled BAC capture approach for highly accurate CNV and SNP/INDEL detection , 2014, Nucleic acids research.

[33]  J. Harrow,et al.  Assessment of transcript reconstruction methods for RNA-seq , 2013, Nature Methods.

[34]  Donald Sharon,et al.  A single-molecule long-read survey of the human transcriptome , 2013, Nature Biotechnology.

[35]  Alfonso Valencia,et al.  APPRIS: annotation of principal and alternative splice isoforms , 2012, Nucleic Acids Res..

[36]  L. Feuk,et al.  Exome RNA sequencing reveals rare and novel alternative transcripts , 2012, Nucleic acids research.

[37]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[38]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[39]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[40]  M. Soda,et al.  High‐throughput resequencing of target‐captured cDNA in cancer cells , 2012, Cancer science.

[41]  Cole Trapnell,et al.  Targeted RNA sequencing reveals the deep complexity of the human transcriptome , 2011, Nature Biotechnology.

[42]  M. Salit,et al.  Synthetic Spike-in Standards for Rna-seq Experiments Material Supplemental Open Access License Commons Creative , 2022 .

[43]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[44]  S. Pääbo,et al.  Multiplexed DNA Sequence Capture of Mitochondrial Genomes Using PCR Products , 2010, PloS one.

[45]  Emily H Turner,et al.  Target-enrichment strategies for next-generation sequencing , 2010, Nature Methods.

[46]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[47]  T. Fennell,et al.  Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts , 2009, Genome Biology.

[48]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[49]  J. Maguire,et al.  Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing , 2009, Nature Biotechnology.

[50]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[51]  Weidong Tian,et al.  Isoform discovery by targeted cloning, 'deep-well' pooling and parallel sequencing , 2008, Nature Methods.

[52]  Feng Chen,et al.  Sequencing and Analysis of Neanderthal Genomic DNA , 2006, Science.

[53]  D. Gautheret,et al.  The disparate nature of "intergenic" polyadenylation sites. , 2006, RNA.

[54]  B. Blencowe Alternative Splicing: New Insights from Global Analyses , 2006, Cell.

[55]  S. Stamm,et al.  Function of Alternative Splicing , 2004 .

[56]  Elaine R Mardis,et al.  Direct genomic selection , 2005, Nature Methods.

[57]  A. J. Lopez,et al.  Developmental role of transcription factor isoforms generated by alternative splicing. , 1995, Developmental biology.