Tripal v3: an ontology-based toolkit for construction of FAIR biological community databases

Abstract Community biological databases provide an important online resource for both public and private data, analysis tools and community engagement. These sites house genomic, transcriptomic, genetic, breeding and ancillary data for specific species, families or clades. Due to the complexity and increasing quantities of these data, construction of online resources is increasingly difficult especially with limited funding and access to technical expertise. Furthermore, online repositories are expected to promote FAIR data principles (findable, accessible, interoperable and reusable) that presents additional challenges. The open-source Tripal database toolkit seeks to mitigate these challenges by creating both the software and an interactive community of developers for construction of online community databases. Additionally, through coordinated, distributed co-development, Tripal sites encourage community-wide sustainability. Here, we report the release of Tripal version 3 that improves data accessibility and data sharing through systematic use of controlled vocabularies (CVs). Tripal uses the community-developed Chado database as a default data store, but now provides tools to support other data stores, while ensuring that CVs remain the central organizational structure for the data. A new site developer can use Tripal to develop a basic site with little to no programming, with the ability to integrate other data types using extension modules and the Tripal application programming interface. A thorough online User’s Guide and Developer’s Handbook are available at http://tripal.info, providing download, installation and step-by-step setup instructions.

[1]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[2]  Chris Mungall,et al.  A Chado case study: an ontology-based modular schema for representing genome-associated biological information , 2007, ISMB/ECCB.

[3]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[4]  Biological Laboratories Divinity Avenue Cambridge Ma Usa. FlyBase FlyBase: a Drosophila database. , 1998, Nucleic acids research.

[5]  Christian Gütl,et al.  Hydra: A Vocabulary for Hypermedia-Driven Web APIs , 2013, LDOW.

[6]  Ming Chen,et al.  Tripal Developer Toolkit , 2018, Database J. Biol. Databases Curation.

[7]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[8]  Steven B. Cannon,et al.  PeanutBase and Other Bioinformatic Resources for Peanut , 2016 .

[9]  Emily M. Strait,et al.  The arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genome , 2015, Genesis.

[10]  Judith A. Blake,et al.  Mouse Genome Database (MGD) 2019 , 2018, Nucleic Acids Res..

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  J Michael Cherry The Saccharomyces Genome Database: A Tool for Discovery. , 2015, Cold Spring Harbor protocols.

[13]  Suzanna E Lewis,et al.  JBrowse: a dynamic web platform for genome visualization and analysis , 2016, Genome Biology.

[14]  Lincoln Stein,et al.  Using GBrowse 2.0 to visualize and share next-generation sequence data , 2013, Briefings Bioinform..

[15]  Colin Diesh,et al.  Apollo: Democratizing genome annotation , 2019, bioRxiv.

[16]  Qian Li,et al.  Sustainable funding for biocuration: The Arabidopsis Information Resource (TAIR) as a case study of a subscription-based funding model , 2016, Database J. Biol. Databases Curation.

[17]  Robert M. Buels,et al.  The Chado Natural Diversity module: a new generic database schema for large-scale phenotyping and genotyping data , 2011, Database J. Biol. Databases Curation.

[18]  Kimberly Van Auken,et al.  WormBase 2017: molting into a new stage , 2017, Nucleic Acids Res..

[19]  Melinda R. Dwinell,et al.  The Rat Genome Database Curators: Who, What, Where, Why , 2009, PLoS Comput. Biol..

[20]  Xiao Zhou,et al.  New extension software modules to enhance searching and display of transcriptome data in Tripal databases , 2017, Database J. Biol. Databases Curation.

[21]  Amit P. Sheth,et al.  Semantic Services, Interoperability and Web Applications - Emerging Concepts , 2011, Semantic Services, Interoperability and Web Applications.

[22]  Stephen P. Ficklin,et al.  Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases , 2013, Database J. Biol. Databases Curation.

[23]  Sergio Contrino,et al.  InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data , 2012, Bioinform..

[24]  Pierre-Antoine Champin,et al.  JSON-LD 1.1 – A JSON-based Serialization for Linked Data , 2019 .

[25]  Wei Huang,et al.  Legume information system (LegumeInfo.org): a key component of a set of federated data resources for the legume family , 2015, Nucleic Acids Res..

[26]  Christian A. Grove,et al.  Using WormBase: A Genome Biology Resource for Caenorhabditis elegans and Related Nematodes. , 2018, Methods in molecular biology.

[27]  Monte Westerfield,et al.  ZFIN, the Zebrafish Model Organism Database: increased support for mutants and transgenics , 2012, Nucleic Acids Res..

[28]  Stephen P. Ficklin,et al.  AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture , 2018, Database J. Biol. Databases Curation.

[29]  Ping Zheng,et al.  15 years of GDR: New data and functionality in the Genome Database for Rosaceae , 2018, Nucleic Acids Res..

[30]  Giulia Antonazzo,et al.  FlyBase 2.0: the next generation , 2018, Nucleic Acids Res..

[31]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[32]  Stephen P. Ficklin,et al.  Tripal: a construction toolkit for online genome databases , 2011, Database J. Biol. Databases Curation.

[33]  Stephen P. Ficklin,et al.  Growing and cultivating the forest genomics database, TreeGenes , 2018, Database J. Biol. Databases Curation.

[34]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[35]  Weisong Liu,et al.  The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease , 2014, Nucleic Acids Res..