tailfindr: alignment-free poly(A) length measurement for Oxford Nanopore RNA and DNA sequencing

Polyadenylation at the 3’-end is a major regulator of messenger RNA and its length is known to affect nuclear export, stability and translation, among others. Only recently, strategies have emerged that allow for genome-wide poly(A) length assessment. These methods identify genes connected to poly(A) tail measurements indirectly by short-read alignment to genetic 3’-ends. Concurrently Oxford Nanopore Technologies (ONT) established full-length isoform RNA sequencing containing the entire poly(A) tail. However, assessing poly(A) length through basecalling has so far not been possible due the inability to resolve long homopolymeric stretches in ONT sequencing. Here we present tailfindr, an R package to estimate poly(A) tail length on ONT long-read sequencing data. tailfindr operates on unaligned, basecalled data. It measures poly(A) tail length from both native RNA and DNA sequencing, which makes poly(A) tail studies by full-length cDNA approaches possible for the first time. We assess tailfindr’s performance across different poly(A) lengths, demonstrating that tailfindr is a versatile tool providing poly(A) tail estimates across a wide range of sequencing conditions.

[1]  Hugh E. Olsen,et al.  The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community , 2016, Genome Biology.

[2]  V. Kim,et al.  mTAIL-seq reveals dynamic poly(A) tail regulation in oocyte-to-embryo development , 2016, Genes & development.

[3]  A. Carr,et al.  Cytoplasmic poly(A) polymerases mediate cellular responses to S phase arrest , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Direct RNA nanopore sequencing of full-length coronavirus genomes provides novel insights into structural variants and enables modification analysis , 2018, medRxiv.

[5]  G. Brawerman Alterations in the size of the poly(a) segment in newly-synthesized messenger RNA of mouse sarcoma 180 ascites cells , 1973, Molecular Biology Reports.

[6]  B. Groner,et al.  Length heterogeneity in the poly (adenylic acid) region of yeast messenger ribonucleic acid. , 1974, Biochemistry.

[7]  Mengmeng Huang,et al.  PCR amplification of repetitive DNA: a limitation to genome editing technologies and many other applications , 2014, Scientific Reports.

[8]  L. Pikó,et al.  RNA synthesis and cytoplasmic polyadenylation in the one-cell mouse embryo , 1982, Nature.

[9]  J. Wilusz,et al.  The poly(A) tail inhibits the assembly of a 3'-to-5' exonuclease in an in vitro RNA stability system , 1997, Molecular and cellular biology.

[10]  S. Vagner,et al.  Molecular mechanisms of eukaryotic pre-mRNA 3′ end processing regulation , 2009, Nucleic acids research.

[11]  Daniel R. Garalde,et al.  Highly parallel direct RNA sequencing on an array of nanopores , 2016, Nature Methods.

[12]  W. Kloosterman,et al.  From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy , 2018, Genome Biology.

[13]  K. Ryan,et al.  Phosphorylation of CPEB by Eg2 mediates the recruitment of CPSF into an active cytoplasmic polyadenylation complex. , 2000, Molecular cell.

[14]  G. Brawerman,et al.  Elongation of the polyadenylate segment of messenger RNA in the cytoplasm of mammalian cells. , 1974, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Gene W. Yeo,et al.  Short Poly(A) Tails are a Conserved Feature of Highly Expressed Genes , 2017, Nature Structural & Molecular Biology.

[16]  Jun Hee Lee,et al.  TED-Seq Identifies the Dynamics of Poly(A) Length during ER Stress. , 2018, Cell reports.

[17]  Daniel R Schoenberg,et al.  Assays for determining poly(A) tail length and the polarity of mRNA decay in mammalian cells. , 2008, Methods in enzymology.

[18]  H. Nakazato,et al.  Polyadenylic acid sequences in the heterogeneous nuclear RNA and rapidly-labeled polyribosomal RNA of HeLa cells: possible evidence for a precursor relationship. , 1971, Proceedings of the National Academy of Sciences of the United States of America.

[19]  J. Ziebuhr,et al.  Nanopore direct RNA sequencing reveals modification in full-length coronavirus genomes , 2018, bioRxiv.

[20]  K. Murthy,et al.  Poly(A) polymerase contains multiple functional domains , 1994, Molecular and cellular biology.

[21]  J. Lingrel,et al.  Shortening of the poly(A) region of mouse globin messenger RNA. , 1976, The Journal of biological chemistry.

[22]  J. Darnell,et al.  Polyadenylic Acid Sequences: Role in Conversion of Nuclear RNA into Messenger RNA , 1971, Science.

[23]  S. J. Coleman,et al.  Determinants and implications of mRNA poly(A) tail size--does this protein make my tail look big? , 2014, Seminars in cell & developmental biology.

[24]  Salah Ayoub,et al.  Full-length mRNA sequencing reveals principles of poly(A) tail length control , 2019, bioRxiv.

[25]  V. Kim,et al.  TAIL-seq: genome-wide determination of poly(A) tail length and 3' end modifications. , 2014, Molecular cell.

[26]  Lionel Minvielle-Sebastia,et al.  Dual requirement for yeast hnRNP Nab2p in mRNA poly(A) tail length control and nuclear export , 2002, The EMBO journal.

[27]  B. Graveley,et al.  Ccr4 and Pop2 control poly(A) tail length in Saccharomyces cerevisiae , 2017, bioRxiv.

[28]  Angela N. Brooks,et al.  Nanopore native RNA sequencing of a human poly(A) transcriptome , 2018, bioRxiv.

[29]  T. Preiss,et al.  Widespread use of poly(A) tail length control to accentuate expression of the yeast transcriptome. , 2007, RNA.

[30]  E. Wahle,et al.  Control of poly(A) tail length , 2011, Wiley interdisciplinary reviews. RNA.

[31]  Wouter De Coster,et al.  NanoPack: visualizing and processing long-read sequencing data , 2018, bioRxiv.

[32]  M. Ohno,et al.  Role of poly (A) tail as an identity element for mRNA nuclear export , 2007, Nucleic acids research.

[33]  S. Masich,et al.  Nuclear poly(A)-binding protein PABPN1 is associated with RNA polymerase II during transcription and accompanies the released transcript to the nuclear pore. , 2003, Experimental cell research.

[34]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[35]  J. Manley,et al.  Primary structure and expression of bovine poly(A) polymerase , 1991, Nature.

[36]  D. Bartel,et al.  Poly(A)-tail profiling reveals an embryonic switch in translational control , 2014, Nature.

[37]  N. Conrad,et al.  The Human Nuclear Poly(A)-Binding Protein Promotes RNA Hyperadenylation and Decay , 2013, PLoS genetics.

[38]  K. Eckert,et al.  Factors affecting fidelity of DNA synthesis during PCR amplification of d(C-A)n.d(G-T)n microsatellite repeats. , 1996, Nucleic acids research.

[39]  T. Nilsen Measuring the length of poly(A) tails. , 2015, Cold Spring Harbor protocols.

[40]  J. Lingrel,et al.  Size of the poly(A) region in mouse globin messenger RNA , 1973, Molecular Biology Reports.

[41]  J. Richter,et al.  CPEB is a specificity factor that mediates cytoplasmic polyadenylation during Xenopus oocyte maturation , 1994, Cell.