Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA

Significance Subtle changes in RNA transcript isoform expression can have dramatic effects on cellular behavior in both health and disease. As such, comprehensive and quantitative analysis of isoform-level transcriptomes would open an entirely new window into cellular diversity in fields ranging from developmental to cancer biology. The Rolling Circle Amplification to Concatemeric Consensus (R2C2) method we are presenting here has sufficient throughput and accuracy to make the comprehensive and quantitative analysis of RNA transcript isoforms in bulk and single-cell samples economically feasible. High-throughput short-read sequencing has revolutionized how transcriptomes are quantified and annotated. However, while Illumina short-read sequencers can be used to analyze entire transcriptomes down to the level of individual splicing events with great accuracy, they fall short of analyzing how these individual events are combined into complete RNA transcript isoforms. Because of this shortfall, long-distance information is required to complement short-read sequencing to analyze transcriptomes on the level of full-length RNA transcript isoforms. While long-read sequencing technology can provide this long-distance information, there are issues with both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) long-read sequencing technologies that prevent their widespread adoption. Briefly, PacBio sequencers produce low numbers of reads with high accuracy, while ONT sequencers produce higher numbers of reads with lower accuracy. Here, we introduce and validate a long-read ONT-based sequencing method. At the same cost, our Rolling Circle Amplification to Concatemeric Consensus (R2C2) method generates more accurate reads of full-length RNA transcript isoforms than any other available long-read sequencing method. These reads can then be used to generate isoform-level transcriptomes for both genome annotation and differential expression analysis in bulk or single-cell samples.

[1]  David Allman,et al.  Convergence of Acquired Mutations and Alternative Splicing of CD19 Enables Resistance to CART-19 Immunotherapy. , 2015, Cancer discovery.

[2]  Rodrigo Lopez,et al.  Analysis Tool Web Services from the EMBL-EBI , 2013, Nucleic Acids Res..

[3]  Hagen U. Tilgner,et al.  Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome , 2018, Genome research.

[4]  Rodrigo Lopez,et al.  The EMBL-EBI bioinformatics web and programmatic tools framework , 2015, Nucleic Acids Res..

[5]  D. G. Gibson,et al.  Enzymatic assembly of DNA molecules up to several hundred kilobases , 2009, Nature Methods.

[6]  Lindenbaum Pierre,et al.  JVarkit: java-based utilities for Bioinformatics , 2015 .

[7]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[8]  T. Blauwkamp,et al.  Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events , 2015, Nature Biotechnology.

[9]  S. Riddell,et al.  Fully human CD19-specific chimeric antigen receptors for T-cell therapy , 2017, Leukemia.

[10]  Niranjan Nagarajan,et al.  INC-Seq: Accurate single molecule reads using nanopore sequencing , 2016 .

[11]  C. Vollmers,et al.  Tn5Prime, a Tn5 based 5′ capture method for single cell RNA-seq , 2017, bioRxiv.

[12]  Donald Sharon,et al.  A single-molecule long-read survey of the human transcriptome , 2013, Nature Biotechnology.

[13]  E. Tseng,et al.  Normalized long read RNA sequencing in chicken reveals transcriptome complexity similar to human , 2017, BMC Genomics.

[14]  Hugh E. Olsen,et al.  Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells , 2017, Nature Communications.

[15]  C. Paret,et al.  CD19 Isoforms Enabling Resistance to CART-19 Immunotherapy Are Expressed in B-ALL Patients at Initial Diagnosis , 2017, Journal of immunotherapy.

[16]  Jiannis Ragoussis,et al.  Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations , 2016, Scientific Reports.

[17]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[18]  E. Eichler,et al.  Long-read sequencing and de novo assembly of a Chinese genome , 2016, Nature Communications.

[19]  Åsa K. Björklund,et al.  Tn5 transposase and tagmentation procedures for massively scaled sequencing projects , 2014, Genome research.

[20]  Åsa K. Björklund,et al.  Full-length RNA-seq from single cells using Smart-seq2 , 2014, Nature Protocols.

[21]  Lennart Martens,et al.  1 SQANTI : extensive characterization of long read transcript sequences for quality control in 1 full-length transcriptome identification and quantification 2 3 , 2017 .

[22]  Heng Li,et al.  Minimap2: fast pairwise alignment for long nucleotide sequences , 2017 .

[23]  Ole Tange,et al.  GNU Parallel: The Command-Line Power Tool , 2011, login Usenix Mag..

[24]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[25]  Niranjan Nagarajan,et al.  INC-Seq: accurate single molecule reads using nanopore sequencing , 2016, bioRxiv.

[26]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[27]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..