Annocript: a flexible pipeline for the annotation of transcriptomes able to identify putative long noncoding RNAs

UNLABELLED The eukaryotic transcriptome is composed of thousands of coding and long non-coding RNAs (lncRNAs). However, we lack a software platform to identify both RNA classes in a given transcriptome. Here we introduce Annocript, a pipeline that combines the annotation of protein coding transcripts with the prediction of putative lncRNAs in whole transcriptomes. It downloads and indexes the needed databases, runs the analysis and produces human readable and standard outputs together with summary statistics of the whole analysis. AVAILABILITY AND IMPLEMENTATION Annocript is distributed under the GNU General Public License (version 3 or later) and is freely available at https://github.com/frankMusacchia/Annocript. CONTACT remo.sanges@szn.it.

[1]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[2]  Michael F. Lin,et al.  PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions , 2010 .

[3]  Abhishek Kumar,et al.  Transcriptome sequencing and de novo annotation of the critically endangered Adriatic sturgeon , 2013, BMC Genomics.

[4]  Ian Korf,et al.  Serial BLAST searching , 2003, Bioinform..

[5]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[6]  Narmada Thanki,et al.  CDD: conserved domains and protein three-dimensional structure , 2012, Nucleic Acids Res..

[7]  Manolis Kellis,et al.  PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions , 2011, Bioinform..

[8]  Lars Kraemer,et al.  The Transcriptome Analysis and Comparison Explorer - T-ACE: a platform-independent, graphical tool to process large RNAseq datasets of non-model organisms , 2012, Bioinform..

[9]  Roberto T. Arrial,et al.  Screening non-coding RNAs in transcriptomes from neglected species using PORTRAIT: case study of the pathogenic fungus Paracoccidioides brasiliensis , 2009, BMC Bioinformatics.

[10]  The UniProt Consortium,et al.  Update on activities at the Universal Protein Resource (UniProt) in 2013 , 2012, Nucleic Acids Res..

[11]  Rasmus Wernersson,et al.  Virtual Ribosome—a comprehensive DNA translation tool with support for integration of sequence feature annotation , 2006, Nucleic Acids Res..

[12]  B. Rost,et al.  Distinguishing Protein-Coding from Non-Coding RNAs through Support Vector Machines , 2006, PLoS genetics.

[13]  Yong Zhang,et al.  CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine , 2007, Nucleic Acids Res..

[14]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[15]  Anne Morgat,et al.  UniPathway: a resource for the exploration and annotation of metabolic pathways , 2011, Nucleic Acids Res..

[16]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[17]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[18]  J. Kocher,et al.  CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model , 2013, Nucleic acids research.

[19]  Gertraud Burger,et al.  AutoFACT: An Automatic Functional Annotation and Classification Tool , 2005, BMC Bioinformatics.

[20]  Sean R. Eddy,et al.  Rfam 11.0: 10 years of RNA families , 2012, Nucleic Acids Res..

[21]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[22]  Mark L. Blaxter,et al.  annot8r: GO, EC and KEGG annotation of EST datasets , 2008, BMC Bioinformatics.