phyloXML: XML for evolutionary biology and comparative genomics

BackgroundEvolutionary trees are central to a wide range of biological studies. In many of these studies, tree nodes and branches need to be associated (or annotated) with various attributes. For example, in studies concerned with organismal relationships, tree nodes are associated with taxonomic names, whereas tree branches have lengths and oftentimes support values. Gene trees used in comparative genomics or phylogenomics are usually annotated with taxonomic information, genome-related data, such as gene names and functional annotations, as well as events such as gene duplications, speciations, or exon shufflings, combined with information related to the evolutionary tree itself. The data standards currently used for evolutionary trees have limited capacities to incorporate such annotations of different data types.ResultsWe developed a XML language, named phyloXML, for describing evolutionary trees, as well as various associated data items. PhyloXML provides elements for commonly used items, such as branch lengths, support values, taxonomic names, and gene names and identifiers. By using "property" elements, phyloXML can be adapted to novel and unforeseen use cases. We also developed various software tools for reading, writing, conversion, and visualization of phyloXML formatted data.ConclusionPhyloXML is an XML language defined by a complete schema in XSD that allows storing and exchanging the structures of evolutionary trees as well as associated data. More information about phyloXML itself, the XSD schema, as well as tools implementing and supporting phyloXML, is available at http://www.phyloxml.org.

[1]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[2]  Michael Schroeder,et al.  A Semantic Web for bioinformatics: goals, tools, systems, applications , 2008, BMC Bioinformatics.

[3]  A. Godzik,et al.  Surprising complexity of the ancestral apoptosis network , 2007, Genome Biology.

[4]  C. Fraser,et al.  Phylogenomics: Intersection of Evolution and Genomics , 2003, Science.

[5]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[6]  Marc E. Colosimo,et al.  TreeViewJ: An application for viewing and analyzing phylogenetic trees , 2007, Source Code for Biology and Medicine.

[7]  K. Sjölander,et al.  Taking the first steps towards a standard for reporting on phylogenies: Minimum Information About a Phylogenetic Analysis (MIAPA). , 2006, Omics : a journal of integrative biology.

[8]  R Gilmour,et al.  Taxonomic markup language: applying XML to systematic data , 2000, Bioinform..

[9]  Sean R. Eddy,et al.  A simple algorithm to infer gene duplication and speciation events on a gene tree , 2001, Bioinform..

[10]  Adam Godzik,et al.  Novel genes dramatically alter regulatory network topology in amphioxus , 2008, Genome Biology.

[11]  Sean R. Eddy,et al.  ATV: display and manipulation of annotated phylogenetic , 2001, Bioinform..

[12]  Minoru Kanehisa,et al.  BioRuby: Open-Source Bioinformatics Library , 2003 .

[13]  Martin Kuiper,et al.  Biological knowledge management: the emerging role of the Semantic Web technologies , 2009, Briefings Bioinform..

[14]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[15]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[16]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[17]  John Avise Books Received , 2000, Heredity.

[18]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[19]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..