High-throughput RNA isoform sequencing using programmable cDNA concatenation

Alternative splicing is a core biological process that enables profound and essential diversification of gene function. Short-read RNA sequencing approaches fail to resolve RNA isoforms and therefore primarily enable gene expression measurements - an isoform unaware representation of the transcriptome. Conversely, full-length RNA sequencing using long-read technologies are able to capture complete transcript isoforms, but their utility is deeply constrained due to throughput limitations. Here, we introduce MAS-ISO-seq, a technique for programmably concatenating cDNAs into single molecules optimal for long-read sequencing, boosting the throughput >15 fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. We validated unambiguous isoform assignment with MAS-ISO-seq using a synthetic RNA isoform library and applied this approach to single-cell RNA sequencing of tumor-infiltrating T cells. Results demonstrated a >30 fold boosted discovery of differentially spliced genes and robust cell clustering, as well as canonical PTPRC splicing patterns across T cell subpopulations and the concerted expression of the associated hnRNPLL splicing factor. Methods such as MAS-ISO-seq will drive discovery of novel isoforms and the transition from gene expression to transcript isoform expression analyses.

[1]  Xiaochen Bo,et al.  High-throughput and high-accuracy single-cell RNA isoform analysis using PacBio circular consensus sequencing , 2023, Nature communications.

[2]  I. Chen,et al.  PacBio sequencing output increased through uniform and directional fivefold concatenation , 2021, Scientific Reports.

[3]  Beryl B. Cummings,et al.  Transcriptome variation in human tissues revealed by long-read sequencing , 2021, Nature.

[4]  James C. Wright,et al.  GENCODE 2021 , 2020, Nucleic Acids Res..

[5]  Raphael Gottardo,et al.  Integrated analysis of multimodal single-cell data , 2020, Cell.

[6]  R. Müller,et al.  Application of subject-specific adaptive mechanical loading for bone healing in a mouse tail vertebral defect , 2020, Scientific Reports.

[7]  Xiaochen Bo,et al.  HIT-scISOseq: High-throughput and High-accuracy Single-cell Full-length Isoform Sequencing for Corneal Epithelium , 2020, bioRxiv.

[8]  P. Carmeliet,et al.  PHD1 controls muscle mTORC1 in a hydroxylation-independent manner by stabilizing leucyl tRNA synthetase , 2020, Nature Communications.

[9]  R. Sandberg,et al.  Single-cell RNA counting at allele and isoform resolution using Smart-seq3 , 2019, Nature Biotechnology.

[10]  Hagen U. Tilgner,et al.  Getting the Entire Message: Progress in Isoform Sequencing , 2019, Front. Genet..

[11]  Sergey Koren,et al.  Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome , 2019, Nature Biotechnology.

[12]  Geo Pertea,et al.  Transcriptome assembly from long-read RNA-seq alignments with StringTie2 , 2019, Genome Biology.

[13]  Ryan R. Wick,et al.  Performance of neural network basecalling tools for Oxford Nanopore sequencing , 2019, Genome Biology.

[14]  Richard E. Green,et al.  Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA , 2018, Proceedings of the National Academy of Sciences.

[15]  Angela N. Brooks,et al.  Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns , 2018, Nature Communications.

[16]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[17]  J. Berka,et al.  ConcatSeq: A method for increasing throughput of single molecule sequencing by concatenating short DNA fragments , 2017, Scientific Reports.

[18]  F. Baralle,et al.  Alternative splicing as a regulator of development and tissue identity , 2017, Nature Reviews Molecular Cell Biology.

[19]  A. Bhardwaj,et al.  In situ click chemistry generation of cyclooxygenase-2 inhibitors , 2017, Nature Communications.

[20]  A. Pollard,et al.  Limb proportions show developmental plasticity in response to embryo movement , 2017, Scientific Reports.

[21]  M. Ante,et al.  SIRVs: Spike-In RNA Variants as External Isoform Controls in RNA-Sequencing , 2016, bioRxiv.

[22]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[23]  Eunhee Kim,et al.  RNA splicing factors as oncoproteins and tumour suppressors , 2016, Nature Reviews Cancer.

[24]  A. Heger,et al.  UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[25]  M. Swanson,et al.  RNA mis-splicing in disease , 2015, Nature Reviews Genetics.

[26]  M. Zavolan,et al.  Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data , 2015, Genome Biology.

[27]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[28]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[29]  Guillaume J. Filion,et al.  Starcode: sequence clustering based on all-pairs search , 2015, Bioinform..

[30]  B. Tjaden,et al.  De novo assembly of bacterial transcriptomes from RNA-seq data , 2015, Genome Biology.

[31]  Tilo Buschmann,et al.  Levenshtein error-correcting barcodes for multiplexed DNA sequencing , 2013, BMC Bioinformatics.

[32]  Gabor T. Marth,et al.  SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications , 2012, PloS one.

[33]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[34]  N. Hacohen,et al.  Regulation of CD45 Alternative Splicing by Heterogeneous Ribonucleoprotein, hnRNPLL , 2008, Science.

[35]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[36]  Yutaka Suzuki,et al.  Transcript Identification Through Long-Read Sequencing. , 2021, Methods in molecular biology.

[37]  G. Pertea,et al.  GFF Utilities: GffRead and GffCompare. , 2020, F1000Research.