论文信息 - Extraction of poly(A) sites from large-scale RNA-Seq data.

Extraction of poly(A) sites from large-scale RNA-Seq data.

The NCBI manages the SRA (Sequence Read Archive) database to store RNA-Seq data generated from different NGS technologies. With ever increasing finished and ongoing genome and transcriptome sequencing projects, the data in SRA expand rapidly and present a treasure for mining useful information to facilitate our understanding of biological issues like mRNA 3'-end formation and alternative polyadenylation. We developed a bioinformatics pipeline that can process raw SRA sequence data and obtain high quality poly(A) sites and poly(A) cluster sites with detailed expression information. This pipeline is designed to be generic and can be utilized for polyadenylation studies in any eukaryotic species.

[1] P. Kapranov,et al. Comprehensive Polyadenylation Site Maps in Yeast and Human Reveal Pervasive Alternative Polyadenylation , 2010, Cell.

[2] Bin Tian,et al. A large-scale analysis of mRNA polyadenylation of human and mouse genes , 2005, Nucleic acids research.

[3] Serban Nacu,et al. Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[4] E Pauws,et al. Heterogeneity in polyadenylation cleavage sites in mammalian mRNA sequences: implications for SAGE analysis. , 2001, Nucleic acids research.

[5] Thomas D. Wu,et al. GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[6] Gonçalo R. Abecasis,et al. The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[7] Gregory D. Schuler,et al. Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.