Publishing FAIR Data: An Exemplar Methodology Utilizing PHI-Base

Pathogen-Host interaction data is core to our understanding of disease processes and their molecular/genetic bases. Facile access to such core data is particularly important for the plant sciences, where individual genetic and phenotypic observations have the added complexity of being dispersed over a wide diversity of plant species vs. the relatively fewer host species of interest to biomedical researchers. Recently, an international initiative interested in scholarly data publishing proposed that all scientific data should be “FAIR”—Findable, Accessible, Interoperable, and Reusable. In this work, we describe the process of migrating a database of notable relevance to the plant sciences—the Pathogen-Host Interaction Database (PHI-base)—to a form that conforms to each of the FAIR Principles. We discuss the technical and architectural decisions, and the migration pathway, including observations of the difficulty and/or fidelity of each step. We examine how multiple FAIR principles can be addressed simultaneously through careful design decisions, including making data FAIR for both humans and machines with minimal duplication of effort. We note how FAIR data publishing involves more than data reformatting, requiring features beyond those exhibited by most life science Semantic Web or Linked Data resources. We explore the value-added by completing this FAIR data transformation, and then test the result through integrative questions that could not easily be asked over traditional Web-based data resources. Finally, we demonstrate the utility of providing explicit and reliable access to provenance information, which we argue enhances citation rates by encouraging and facilitating transparent scholarly reuse of these valuable data holdings.

[1]  Núria Queralt-Rosinach,et al.  The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery , 2014, J. Biomed. Semant..

[2]  Matthew R. Hanlon,et al.  Araport: the Arabidopsis Information Portal , 2014, Nucleic Acids Res..

[3]  Holger Knublauch,et al.  The Protégé OWL Plugin: An Open Development Environment for Semantic Web Applications , 2004, SEMWEB.

[4]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[5]  Robert Lanfear,et al.  Public Data Archiving in Ecology and Evolution: How Well Are We Doing? , 2015, PLoS biology.

[6]  Hongyan Wu,et al.  BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data , 2014, J. Biomed. Semant..

[7]  Roy T. Fielding,et al.  Principled design of the modern Web architecture , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[8]  Rainer Winnenburg,et al.  The pathogen-host interactions database (PHI-base) provides insights into generic and novel themes of pathogenicity. , 2006, Molecular plant-microbe interactions : MPMI.

[9]  Anton J. Enright,et al.  RNAcentral: A vision for an international database of RNA sequences. , 2011, RNA.

[10]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[11]  Jessica A. Turner,et al.  Modeling biomedical experimental processes with OBI , 2010, J. Biomed. Semant..

[12]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[13]  Michel Dumontier,et al.  The health care and life sciences community profile for dataset descriptions , 2016, PeerJ.

[14]  Mercè Crosas,et al.  The Dataverse Network®: An Open-Source Application for Sharing, Discovering and Preserving Data , 2011, D Lib Mag..

[15]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[16]  Peter Wittenburg,et al.  EUDAT: A New Cross-Disciplinary Data Infrastructure for Science , 2013, Int. J. Digit. Curation.

[17]  Asunción Gómez-Pérez,et al.  Publishing Linked Data - There is no One-Size-Fits-All Formula , 2012 .

[18]  C. Lacomme Plant Pathology , 2015, Methods in Molecular Biology.

[19]  Gene Ontology Consortium,et al.  The Gene Ontology (GO) project in 2006 , 2005, Nucleic Acids Res..

[20]  Michael Y. Galperin,et al.  The 2015 Nucleic Acids Research Database Issue and Molecular Biology Database Collection , 2014, Nucleic Acids Res..

[21]  Jesualdo Tomás Fernández-Breis,et al.  Towards the Semantic Standardization of Orthology Content , 2015, SWAT4LS.

[22]  Chris Mungall,et al.  obo-relations: 2015-10-29 release , 2015 .

[23]  Greg J. Boland Plant Pathology, G.N. Agrios. Fifth ed. Elsevier Academic Press, Burlington, MA (2005). 922 pp., Hardcover, Price: US$ 74.95, ISBN: 0-12-044565-4. , 2007 .

[24]  Rashmi Pant,et al.  The Pathogen-Host Interactions database (PHI-base): additions and future developments , 2014, Nucleic Acids Res..

[25]  Lennart Martens,et al.  The Ontology Lookup Service: bigger and better , 2010, Nucleic Acids Res..

[26]  Catherine Dolbear,et al.  Publishing Linked Data , 2013 .

[27]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[28]  Mikel Egaña Aranguren,et al.  Plant Pathogen Interactions Ontology (PPIO) , 2013, IWBBIO.