论文信息 - Poly(A)-tag deep sequencing data processing to extract poly(A) sites.

Poly(A)-tag deep sequencing data processing to extract poly(A) sites.

Polyadenylation [poly(A)] is an essential posttranscriptional processing step in the maturation of eukaryotic mRNA. The advent of next-generation sequencing (NGS) technology has offered feasible means to generate large-scale data and new opportunities for intensive study of polyadenylation, particularly deep sequencing of the transcriptome targeting the junction of 3'-UTR and the poly(A) tail of the transcript. To take advantage of this unprecedented amount of data, we present an automated workflow to identify polyadenylation sites by integrating NGS data cleaning, processing, mapping, normalizing, and clustering. In this pipeline, a series of Perl scripts are seamlessly integrated to iteratively map the single- or paired-end sequences to the reference genome. After mapping, the poly(A) tags (PATs) at the same genome coordinate are grouped into one cleavage site, and the internal priming artifacts removed. Then the ambiguous region is introduced to parse the genome annotation for cleavage site clustering. Finally, cleavage sites within a close range of 24 nucleotides and from different samples can be clustered into poly(A) clusters. This procedure could be used to identify thousands of reliable poly(A) clusters from millions of NGS sequences in different tissues or treatments.

[1] B. Meyers,et al. Transcriptome dynamics through alternative polyadenylation in developmental and environmental responses in plants revealed by deep sequencing. , 2011, Genome research.

[2] Guoli Ji,et al. Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for extensive alternative polyadenylation , 2011, Proceedings of the National Academy of Sciences.

[3] Guoli Ji,et al. Genome level analysis of rice mRNA 3′-end processing signals and alternative polyadenylation , 2008, Nucleic acids research.

[4] Denghui Xing,et al. Alternative polyadenylation and gene expression regulation in plants , 2011, Wiley interdisciplinary reviews. RNA.

[5] Cole Trapnell,et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[6] Pratap Kumar Pati,et al. High throughput characterizations of poly(A) site choice in plants. , 2014, Methods.

[7] Bin Tian,et al. A large-scale analysis of mRNA polyadenylation of human and mouse genes , 2005, Nucleic acids research.

[8] Chun Liang,et al. Unique Features of Nuclear mRNA Poly(A) Signals and Alternative Polyadenylation in Chlamydomonas reinhardtii , 2008, Genetics.