Phylesystem: a git-based data store for community-curated phylogenetic estimates

Motivation: Phylogenetic estimates from published studies can be archived using general platforms like Dryad (Vision, 2010) or TreeBASE (Sanderson et al., 1994). Such services fulfill a crucial role in ensuring transparency and reproducibility in phylogenetic research. However, digital tree data files often require some editing (e.g. rerooting) to improve the accuracy and reusability of the phylogenetic statements. Furthermore, establishing the mapping between tip labels used in a tree and taxa in a single common taxonomy dramatically improves the ability of other researchers to reuse phylogenetic estimates. As the process of curating a published phylogenetic estimate is not error-free, retaining a full record of the provenance of edits to a tree is crucial for openness, allowing editors to receive credit for their work and making errors introduced during curation easier to correct. Results: Here, we report the development of software infrastructure to support the open curation of phylogenetic data by the community of biologists. The backend of the system provides an interface for the standard database operations of creating, reading, updating and deleting records by making commits to a git repository. The record of the history of edits to a tree is preserved by git’s version control features. Hosting this data store on GitHub (http://github.com/) provides open access to the data store using tools familiar to many developers. We have deployed a server running the ‘phylesystem-api’, which wraps the interactions with git and GitHub. The Open Tree of Life project has also developed and deployed a JavaScript application that uses the phylesystem-api and other web services to enable input and curation of published phylogenetic statements. Availability and implementation: Source code for the web service layer is available at https://github.com/OpenTreeOfLife/phylesystem-api. The data store can be cloned from: https://github.com/OpenTreeOfLife/phylesystem. A web application that uses the phylesystem web services is deployed at http://tree.opentreeoflife.org/curator. Code for that tool is available from https://github.com/OpenTreeOfLife/opentree. Contact: mtholder@gmail.com

[1]  Keith A. Crandall,et al.  Lost Branches on the Tree of Life , 2013, PLoS biology.

[2]  Karthik Ram,et al.  Git can facilitate greater reproducibility and increased transparency in science , 2013, Source Code for Biology and Medicine.

[3]  T. Vision Open Data and the Social Contract of Scientific Publishing , 2010 .

[4]  Luke J. Harmon,et al.  Best Practices for Data Sharing in Phylogenetic Research , 2014, PLoS currents.

[5]  Arlin Stoltzfus,et al.  Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis , 2012, BMC Research Notes.

[6]  Gregor Hagedorn,et al.  Scientific names of organisms: attribution, rights, and licensing , 2014, BMC Research Notes.

[7]  Andrew F. Magee,et al.  The Dawn of Open Access to Phylogenetic Data , 2014, PloS one.

[8]  J. G. Burleigh,et al.  Synthesis of phylogeny and taxonomy into a comprehensive tree of life , 2014, Proceedings of the National Academy of Sciences.

[9]  Cécile Ané,et al.  Missing the forest for the trees: phylogenetic compression and its implications for inferring complex evolutionary histories. , 2005, Systematic biology.

[10]  Hilmar Lapp,et al.  NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata , 2012, Systematic biology.

[11]  Jeannine Cavender-Bares,et al.  Synthesizing phylogenetic knowledge for ecological research , 2012 .

[12]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[13]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[14]  Douglas Crockford,et al.  The application/json Media Type for JavaScript Object Notation (JSON) , 2006, RFC.