CapTrap-Seq: A platform-agnostic and quantitative approach for high-fidelity full-length RNA transcript sequencing

Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we developed CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5’capped, full-length transcripts, together with the data processing pipeline LyRic. We benchmarked CapTrap-seq and other popular RNA-seq library preparation protocols in a number of human tissues using both ONT and PacBio sequencing. To assess the accuracy of the transcript models produced, we introduced a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5’cap formation in RNA spike-in molecules. We found that the vast majority (up to 90%) of transcript models that LyRic derives from CapTrap-seq reads are full-length. This makes it possible to produce highly accurate annotations with minimal human intervention.

[1]  James C. Wright,et al.  GENCODE: reference annotation for the human and mouse genomes in 2023 , 2022, Nucleic Acids Res..

[2]  J. Ragoussis,et al.  Improved Nanopore full-length cDNA sequencing by PCR-suppression , 2022, bioRxiv.

[3]  Dalia A. Conde,et al.  The Earth BioGenome Project 2020: Starting the clock , 2022, Proceedings of the National Academy of Sciences.

[4]  Xiao Ma,et al.  RNA editing regulates lncRNA splicing in human early embryo development , 2021, PLoS Comput. Biol..

[5]  J. Coller,et al.  Roles of mRNA poly(A) tails in regulation of eukaryotic gene expression , 2021, Nature Reviews Molecular Cell Biology.

[6]  J. Mattick,et al.  Nano3P-seq: transcriptome-wide analysis of gene expression and tail dynamics using end-capture nanopore cDNA sequencing , 2021, bioRxiv.

[7]  Manolis Maragkakis,et al.  TERA-Seq: true end-to-end sequencing of native RNA molecules for transcriptome characterization , 2021, Nucleic acids research.

[8]  Jonathan M. Mudge,et al.  Systematic assessment of long-read RNA-seq methods for transcript identification and quantification , 2021 .

[9]  F. Sedlazeck,et al.  Towards population-scale long-read sequencing , 2021, Nature Reviews Genetics.

[10]  Sven Rahmann,et al.  Sustainable data analysis with Snakemake , 2021, F1000Research.

[11]  Matthew E. Ritchie,et al.  The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read differential expression analysis tools , 2021, NAR genomics and bioinformatics.

[12]  J. de Magalhães,et al.  Gene Size Matters: An Analysis of Gene Length in the Human Genome , 2021, Frontiers in Genetics.

[13]  Maite Huarte,et al.  Gene regulation by long non-coding RNAs and its biological functions , 2020, Nature reviews. Molecular cell biology.

[14]  F. O. Bagger,et al.  Benchmarking full-length transcript single cell mRNA sequencing protocols , 2020, bioRxiv.

[15]  Jordan A. Ramilowski,et al.  Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network , 2020, Nature Communications.

[16]  Michael J. Purcaro,et al.  Expanded encyclopaedias of DNA elements in the human and mouse genomes , 2020, Nature.

[17]  D. Burt,et al.  Illuminating the dark side of the human transcriptome with long read transcript sequencing , 2020, BMC Genomics.

[18]  M. Ritchie,et al.  Opportunities and challenges in long-read sequencing data analysis , 2020, Genome Biology.

[19]  L. Kang,et al.  Long-read direct RNA sequencing by 5’-Cap capturing reveals the impact of Piwi on the widespread exonization of transposable elements in locusts , 2019, RNA biology.

[20]  J. Sambrook,et al.  Long and Accurate Polymerase Chain Reaction (LA PCR). , 2019, Cold Spring Harbor protocols.

[21]  Mark Gerstein,et al.  GENCODE reference annotation for the human and mouse genomes , 2018, Nucleic Acids Res..

[22]  A. Frankish,et al.  Towards a complete map of the human long non-coding RNA transcriptome , 2018, Nature Reviews Genetics.

[23]  Shanrong Zhao,et al.  Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: polyA+ selection versus rRNA depletion , 2018, Scientific Reports.

[24]  Pedro G. Ferreira,et al.  The effects of death and post-mortem cold ischemia on human tissue transcriptomes , 2018, Nature Communications.

[25]  K. Neugebauer,et al.  Splicing and transcription touch base: co-transcriptional spliceosome assembly and function , 2017, Nature Reviews Molecular Cell Biology.

[26]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[27]  Jennifer Harrow,et al.  High-throughput annotation of full-length long noncoding RNAs with Capture Long-Read Sequencing , 2017, Nature Genetics.

[28]  Chuan He,et al.  Post-transcriptional gene regulation by mRNA modifications , 2016, Nature Reviews Molecular Cell Biology.

[29]  M. Ante,et al.  SIRVs: Spike-In RNA Variants as External Isoform Controls in RNA-Sequencing , 2016, bioRxiv.

[30]  Ira W. Deveson,et al.  Spliced synthetic genes as internal controls in RNA sequencing experiments , 2016, Nature Methods.

[31]  R. Reinhardt,et al.  cDNA Library Enrichment of Full Length Transcripts for SMRT Long Read Sequencing , 2016, PloS one.

[32]  G. B. Robb,et al.  mRNA capping: biological functions and applications , 2016, Nucleic acids research.

[33]  Fidel Ramírez,et al.  deepTools2: a next generation web server for deep-sequencing data analysis , 2016, Nucleic Acids Res..

[34]  Dmitri D. Pervouchine,et al.  The human transcriptome across tissues and individuals , 2015, Science.

[35]  Yongsheng Bai,et al.  Evaluation of de novo transcriptome assemblies from RNA-Seq data , 2014, Genome Biology.

[36]  Cesare Furlanello,et al.  A promoter-level mammalian expression atlas , 2015 .

[37]  C. Thermes,et al.  Library preparation methods for next-generation sequencing: tone down the bias. , 2014, Experimental cell research.

[38]  Åsa K. Björklund,et al.  Smart-seq2 for sensitive full-length transcriptome profiling in single cells , 2013, Nature Methods.

[39]  R. Sandberg,et al.  Full-Length mRNA-Seq from single cell levels of RNA and individual circulating tumor cells , 2012, Nature Biotechnology.

[40]  S. Linnarsson,et al.  Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. , 2011, Genome research.

[41]  N. Friedman,et al.  Comprehensive comparative analysis of strand-specific RNA sequencing methods , 2010, Nature Methods.

[42]  M. Irimia,et al.  When good transcripts go bad: artifactual RT-PCR 'splicing' and genome analysis. , 2008, BioEssays : news and reviews in molecular, cellular and developmental biology.

[43]  L. Alphey,et al.  Female-specific insect lethality engineered using alternative splicing , 2007, Nature Biotechnology.

[44]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.

[45]  Kathleen F. Kerr,et al.  The External RNA Controls Consortium: a progress report , 2005, Nature Methods.

[46]  Piero Carninci,et al.  Cloning full-length, cap-trapper-selected cDNAs by using the single-strand linker ligation method. , 2001, BioTechniques.

[47]  A. Chenchik,et al.  Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. , 2001, BioTechniques.

[48]  Piero Carninci,et al.  High-efficiency full-length cDNA cloning by biotinylated CAP trapper. , 1996, Genomics.

[49]  Piero Carninci,et al.  Cap Analysis of Gene Expression (CAGE): A Quantitative and Genome-Wide Assay of Transcription Start Sites. , 2020, Methods in molecular biology.

[50]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[51]  Piero Carninci,et al.  High-efficiency full-length cDNA cloning. , 1999, Methods in enzymology.

[52]  W. Rossiter,et al.  Progress report , 1954, Research newsletter. College of General Practitioners.