TRAL: tandem repeat annotation library

MOTIVATION Currently, more than 40 sequence tandem repeat detectors are published, providing heterogeneous, partly complementary, partly conflicting results. RESULTS We present TRAL, a tandem repeat annotation library that allows running and parsing of various detection outputs, clustering of redundant or overlapping annotations, several statistical frameworks for filtering false positive annotations, and importantly a tandem repeat annotation and refinement module based on circular profile hidden Markov models (cpHMMs). Using TRAL, we evaluated the performance of a multi-step tandem repeat annotation workflow on 547 085 sequences in UniProtKB/Swiss-Prot. The researcher can use these results to predict run-times for specific datasets, and to choose annotation complexity accordingly. AVAILABILITY AND IMPLEMENTATION TRAL is an open-source Python 3 library and is available, together with documentation and tutorials via http://www.vital-it.ch/software/tral. CONTACT elke.schaper@isb-sib.ch.

[1]  Sergio Maffioletti,et al.  GC3Pie: A Python framework for high-throughput computing , 2012 .

[2]  Maria Anisimova,et al.  Statistical Approaches to Detecting and Analyzing Tandem Repeats in Genomic Sequences , 2015, Front. Bioeng. Biotechnol..

[3]  Andrey V. Kajava,et al.  T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm , 2009, Bioinform..

[4]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[5]  Alain Hauser,et al.  Repeat or not repeat?—Statistical validation of tandem repeat prediction in genomic sequences , 2012, Nucleic acids research.

[6]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[7]  Jaap Heringa,et al.  Tracking repeats using significance and transitivity , 2004, ISMB/ECCB.

[8]  Kevin Karplus,et al.  A Flexible Motif Search Technique Based on Generalized Profiles , 1996, Comput. Chem..

[9]  Eric Rivals,et al.  A new type of Hidden Markov Models to predict complex domain architecture in protein sequences , 2007 .

[10]  O. Gascuel,et al.  Deep Conservation of Human Protein Tandem Repeats within the Eukaryotes , 2014, Molecular biology and evolution.

[11]  M. Anisimova,et al.  The evolution and function of protein tandem repeats in plants. , 2015, The New phytologist.

[12]  Johannes Söding,et al.  De novo identification of highly diverged protein repeats by probabilistic consistency , 2008, Bioinform..

[13]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[14]  Gary Benson,et al.  Tandem repeats over the edit distance , 2007, Bioinform..

[15]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[16]  Aaron M. Newman,et al.  XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences , 2007, BMC Bioinformatics.