CDAOStore: A Phylogenetic Repository Using Logic Programming and Web Services

The CDAOStore is a portal aimed at facilitating the storage and retrieval of data and metadata associated to studies in the field of evolutionary biology and phylogenetic analysis. The novelty of CDAOStore lies in the use of a semantic-based approach to the storage and querying of data. This enables CDAOStore to overcome the data format restrictions and complexities of other repositories (e.g., TreeBase) and to provide a domain-specific query interface, derived from studies of querying requirements for phylogenetic databases. CDAOStore represents the first full implementation of the EvoIO stack, an inter-operation stack composed of a formal ontology (the Comparative Data Analysis Ontology), an XML exchange format (NeXML), and a web services API (PhyloWS). CDAOStore has been implemented on top of an RDF triple store, using a combination of standard web technologies and logic programming technology. In particular, we employed Prolog to support some of the format transformation tasks and, more importantly, in the implementation of several of the domain-specific queries, whose structure is beyond the reach of standard RDF query languages (e.g., SPARQL). CDAOStore is operational and it already hosts over 90 million RDF triples, imported from TreeBase or submitted by other domain scientists.

[1]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[2]  Joel Dudley,et al.  MEGA: A biologist-centric software for evolutionary analysis of DNA and protein sequences , 2008, Briefings Bioinform..

[3]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[4]  Xuhua Xia,et al.  Data Analysis in Molecular Biology and Evolution , 2002, Springer US.

[5]  H. Ellegren Comparative genomics and the study of evolution by natural selection , 2008, Molecular ecology.

[6]  William H. Piel,et al.  Phyloinformatics and Tree Networks , 2003, Computational Biology and Genome Informatics.

[7]  Desh Ranjan,et al.  Interoperability between Bioinformatics Tools: A Logic Programming Approach , 2001, PADL.

[8]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[9]  Vivek Gopalan,et al.  Nexplorer: phylogeny-based exploration of sequence family data , 2006, Bioinform..

[10]  D. Maddison,et al.  The Tree of Life Web Project , 2007 .

[11]  Norman MacLeod,et al.  The role of phylogeny in quantitative paleobiological data analysis , 2001, Paleobiology.

[12]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[13]  Daniel P. Miranker,et al.  Requirements of phylogenetic databases , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[14]  Enrico Pontelli,et al.  Initial Implementation of a Comparative Data Analysis Ontology , 2009, Evolutionary bioinformatics online.

[15]  J. Eisen,et al.  A simple, fast, and accurate method of phylogenomic inference , 2008, Genome Biology.

[16]  Gavin J. D. Smith,et al.  Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic , 2009, Nature.

[17]  X. Xia,et al.  DAMBE: software package for data analysis in molecular biology and evolution. , 2001, The Journal of heredity.