GB‐to‐TNT: facilitating creation of matrices from GenBank and diagnosis of results in TNT

This paper presents a pipeline, implemented in an open‐source program called GB→TNT (GenBank‐to‐TNT), for creating large molecular matrices, starting from GenBank files and finishing with TNT matrices which incorporate taxonomic information in the terminal names. GB→TNT is designed to retrieve a defined genomic region from a bulk of sequences included in a GenBank file. The user defines the genomic region to be retrieved and several filters (genome, length of the sequence, taxonomic group, etc.); each genomic region represents a different data block in the final TNT matrix. GB→TNT first generates Fasta files from the input GenBank files, then creates an alignment for each of those (by calling an alignment program), and finally merges all the aligned files into a single TNT matrix. The new version of TNT can make use of the taxonomic information contained in the terminal names, allowing easy diagnosis of results, evaluation of fit between the trees and the taxonomy, and automatic labelling or colouring of tree branches with the taxonomic groups they represent.

[1]  Vincent Berry,et al.  ScripTree: scripting phylogenetic graphics , 2010, Bioinform..

[2]  Kazutaka Katoh,et al.  Recent developments in the MAFFT multiple sequence alignment program , 2008, Briefings Bioinform..

[3]  Alexandros Stamatakis,et al.  Understanding Angiosperm Diversification Using Small and Large Phylogenetic Trees 1 , 2022 .

[4]  Gaurav Vaidya,et al.  SequenceMatrix: concatenation software for the fast assembly of multi‐gene datasets with character set and codon information , 2011, Cladistics : the international journal of the Willi Hennig Society.

[5]  Roderic D. M. Page,et al.  TreeView: an application to display phylogenetic trees on personal computers , 1996, Comput. Appl. Biosci..

[6]  Pablo A. Goloboff,et al.  TNT, a free program for phylogenetic analysis , 2008 .

[7]  G. Giribet,et al.  TNT: Tree Analysis Using New Technology , 2005 .

[8]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[9]  T. A. Hall,et al.  BIOEDIT: A USER-FRIENDLY BIOLOGICAL SEQUENCE ALIGNMENT EDITOR AND ANALYSIS PROGRAM FOR WINDOWS 95/98/ NT , 1999 .

[10]  Ben C. Stöver,et al.  TreeGraph 2: Combining and visualizing evidence from different phylogenetic analyses , 2010, BMC Bioinformatics.

[11]  Alexander Isaev,et al.  PyEvolve: a toolkit for statistical modelling of molecular evolution , 2004, BMC Bioinformatics.

[12]  W. Wheeler,et al.  POY version 4: phylogenetic analysis using dynamic homologies , 2010, Cladistics : the international journal of the Willi Hennig Society.

[13]  J. Farris,et al.  Phylogenetic analysis of 73 060 taxa corroborates major eukaryotic groups , 2009, Cladistics : the international journal of the Willi Hennig Society.