TBMap: a taxonomic perspective on the phylogenetic database TreeBASE

BackgroundTreeBASE is currently the only available large-scale database of published organismal phylogenies. Its utility is hampered by a lack of taxonomic consistency, both within the database, and with names of organisms in external genomic, specimen, and taxonomic databases. The extent to which the phylogenetic knowledge in TreeBASE becomes integrated with these other sources is limited by this lack of consistency.DescriptionTaxonomic names in TreeBASE were mapped onto names in the external taxonomic databases IPNI, ITIS, NCBI, and uBio, and graph G of these mappings was constructed. Additional edges representing taxonomic synonymies were added to G, then all components of G were extracted. These components correspond to "name clusters", and group together names in TreeBASE that are inferred to refer to the same taxon. The mapping to NCBI enables hierarchical queries to be performed, which can improve TreeBASE information retrieval by an order of magnitude.ConclusionTBMap database provides a mapping of the bulk of the names in TreeBASE to names in external taxonomic databases, and a clustering of those mappings into sets of names that can be regarded as equivalent. This mapping enables queries and visualisations that cannot otherwise be constructed. A simple query interface to the mapping and names clusters is available at http://linnaeus.zoology.gla.ac.uk/~rpage/tbmap.

[1]  Trevor Paterson,et al.  Scientific Names Are Ambiguous as Identifiers for Biological Taxa: Their Context and Definition Are Required for Accurate Data Integration , 2005, DILS.

[2]  J. Doyle,et al.  Tribal Relationships of Sphinctospermum (Leguminosae): Integration of Traditional and Chloroplast DNA Data , 1991 .

[3]  Roderic D. M. Page Taxonomic names, metadata, and the Semantic Web , 2006 .

[4]  Nicolas Rodriguez,et al.  PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees , 2005, Nucleic Acids Res..

[5]  Roderic D. M. Page,et al.  Phyloinformatics: Toward a Phylogenetic Database , 2005, Data Mining in Bioinformatics.

[6]  Catherine N. Norton,et al.  Taxonomic indexing--extending the role of taxonomy. , 2006, Systematic biology.

[7]  Ben Shneiderman,et al.  Categorized graphical overviews for web search results: An exploratory study using U. S. government agencies as a meaningful and stable structure , 2004 .

[8]  Catherine Plaisant,et al.  SpaceTree: supporting exploration in large node link tree, design evolution and empirical evaluation , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[9]  Ziheng Yang,et al.  Divergence dates for Malagasy lemurs estimated from multiple gene loci: geological and evolutionary context , 2004, Molecular ecology.

[10]  A molecular phylogeny of the endemic Australian genus Gastrolobium (Fabaceae: Mirbelieae) and allied genera using chloroplast and nuclear markers. , 2001, American journal of botany.

[11]  Dennis Shasha,et al.  TreeRank: a similarity measure for nearest neighbor searching in phylogenetic databases , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[12]  D. Hillis,et al.  Phylogeny of the New World true frogs (Rana). , 2005, Molecular phylogenetics and evolution.

[13]  O. Bininda-Emonds Phylogenetic Supertrees: Combining Information To Reveal The Tree Of Life , 2004 .

[14]  Michael J. Sanderson,et al.  The Small-world Dynamics of Tree Networks and Data Mining in Phyloinformatics , 2003, Bioinform..

[15]  L. Ruedas DESCRIPTION OF A NEW LARGE-BODIED SPECIES OF APOMYS MEARNS, 1905 (MAMMALIA: RODENTIA: MURIDAE) FROM MINDORO ISLAND, PHILIPPINES , 1995 .

[16]  Jessie B. Kennedy Supporting Taxonomic Names in Cell and Molecular Biology Databases , 2003, OMICS.

[17]  Kaizhong Zhang,et al.  ATreeGrep: approximate searching in unordered trees , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[18]  Roderic D. M. Page,et al.  Modified Mincut Supertrees , 2002, WABI.

[19]  Narain H. Gehani,et al.  BIO-AJAX: an extensible framework for biological data cleaning , 2004, SGMD.

[20]  Dennis Shasha,et al.  A structure-based search engine for phylogenetic databases , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[21]  Olivier François,et al.  Which random processes describe the tree of life? A large-scale study of phylogenetic tree imbalance. , 2006, Systematic biology.

[22]  Gregory T. Chandler,et al.  Monograph of Gastrolobium (Fabaceae: Mirbelieae) , 2002 .

[23]  Roderic D. M. Page,et al.  Taxonomy, Supertrees, and the Tree of Life , 2004 .

[24]  Joe Celko SQL for Smarties: Advanced SQL Programming , 1995 .

[25]  M. Ruggero,et al.  Similarity of Traveling-Wave Delays in the Hearing Organs of Humans and Other Tetrapods , 2007, Journal for the Association for Research in Otolaryngology.

[26]  Michael D. Crisp,et al.  Molecular Evidence for Definition of Genera in the Oxylobium Group (Fabaceae: Mirbelieae) , 2003 .

[27]  Ramana Rao,et al.  A focus+context technique based on hyperbolic geometry for visualizing large hierarchies , 1995, CHI '95.

[28]  M. Wojciechowski,et al.  Phylogeny of Robinioid Legumes (Fabaceae) Revisited: Coursetia and Gliricidia Recircumscribed, and a Biogeographical Appraisal of the Caribbean Endemics , 2009 .

[29]  P. Moler,et al.  THE AMPHIBIAN TREE OF LIFE , 2006 .

[30]  Gabriel Valiente,et al.  An edit script for taxonomic classifications , 2005, BMC Bioinformatics.

[31]  R. Olmstead,et al.  Phylogeny reconstruction: the role of morphology. , 2003, Systematic biology.

[32]  A. Anderberg Taxonomy and phylogeny of the tribeInuleae (Asteraceae) , 1991, Plant Systematics and Evolution.

[33]  J Kennedy,et al.  Standard data model representation for taxonomic information. , 2006, Omics : a journal of integrative biology.

[34]  Gustavo Caetano-Anollés,et al.  An evolutionarily structured universe of protein architecture. , 2003, Genome research.

[35]  G. Barton Scop: structural classification of proteins. , 1994, Trends in biochemical sciences.

[36]  Indra Neil Sarkar,et al.  Taxongrab: Extracting Taxonomic Names from Text , 2005 .

[37]  R DeSalle,et al.  Multiple sources of character information and the phylogeny of Hawaiian drosophilids. , 1997, Systematic biology.

[38]  M. Sanderson,et al.  Identifying Tertiary Radiations of Fabaceae in the Greater Antilles: Alternatives to Cladistic Vicariance Analysis , 2001, International Journal of Plant Sciences.

[39]  Arne A. ANDErBEP.G Taxonomy and phylogeny of the tribe Inuleae (Asteraceae) , 2022 .

[40]  Ben Shneiderman,et al.  Ordered and quantum treemaps: Making effective use of 2D space to display hierarchies , 2002, TOGS.

[41]  A. J. Crawford,et al.  Cenozoic biogeography and evolution in direct-developing frogs of Central America (Leptodactylidae: Eleutherodactylus) as inferred from a phylogenetic analysis of nuclear and mitochondrial genes. , 2005, Molecular phylogenetics and evolution.

[42]  Daniel J. Ford Probabilities on cladograms: introduction to the alpha model , 2005, math/0511246.

[43]  D. Hillis Constraints in naming parts of the Tree of Life. , 2007, Molecular phylogenetics and evolution.

[44]  Roderic D. M. Page,et al.  A Taxonomic Search Engine: Federating taxonomic databases using web services , 2005, BMC Bioinformatics.