On the implementation of a phylogenetic tree database

A molecular phylogenetic tree is a tree-structured graph that represents the evolutionary process of genes, and is constructed from sequence data (such as DNA sequences) obtained from several organisms. Although molecular phylogenetic trees are fundamental data structures in evolutionary analysis, no database system is available that can match trees in the database against a user-supplied tree by comparing tree structures. In this paper, we propose a phylogenetic tree database system with a retrieval function that matches trees having similar structure. The tree data stored in the database are transformed from document images published in biological journals using a pattern-recognition program developed by us. To retrieve phylogenetic trees from the database according to their structures, we propose a method of determining the structural similarity between trees that is based on the split distance method. Our structural similarity measure shows high correlation with the log-likelihood difference that is widely used for comparing phylogenetic trees, and the computation time of our measure is much shorter than that of the log-likelihood difference, which relies on sequence comparison.