Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities

As the biodiversity community increasingly adopts Semantic Web (SW) standards to represent taxonomic registers, trait banks or museum collections, some questions come up relentlessly: How to model the data? For what goals? Can the same model fulfill different goals? So far, the community has mostly considered the SW standards through their most salient manifestation: the Web of Linked Data (Heath and Bizer 2011). Indeed, the 5-star Linked Data principles are geared towards the building of a large, distributed knowledge graph that may successfully fulfill biodiversity’s need for interoperability and data integration. However, the SW addresses a much broader set of problems involving automatic reasoning. For instance, reasoners can exploit ontological knowledge to improve query answering, leverage class definitions to infer class subsumption relationships, or classify individuals i.e. compute instance relationships between individuals and classes by applying reasoning techniques on class definitions and instance descriptions (Shearer et al. 2008). Whether a "thing" should be modelled as a class or a class instance has been debated at length in the SW community, and the answer is often a matter of perspective. In the context of taxonomic registers for example, the NCBI Organismal Classification (Federhen 2012) and Vertebrate Taxonomy Ontology (Midford et al. 2013) represent taxa as classes in the Ontology Web Language (OWL). By contrast, other initiatives represent taxa as instances of various classes, e.g. the SKOS Concept class (skos:Concept) in the AGROVOC thesaurus (Caracciolo et al. 2013) (we speak of the instances as SKOS concepts), the Darwin Core taxon class (dwc:Taxon) in Encyclopedia of Life (Parr et al. 2016), or classes depicting taxonomic ranks in GeoSpecies, DBpedia and the BBC Wildlife Ontology. Such modelling discrepancies impede linking congruent taxa throughout taxonomic registers. Indeed, one can state the equivalence between two classes (with owl:equivalentClass) or two class instances (with owl:sameAs, skos:exactMatch, etc.), but good practices discourage the alignment of classes with class instances (Baader et al. 2003). Recently, Darwin Core's popularity has fostered the modeling of taxa as instances of class dwc:Taxon (Senderov et al. 2018, Parr et al. 2016). In this context, pragmatism may incline a Linked Data provider to comply with this majority trend to ensure maximum interlinking. Although technically and conceptually valid, this choice entails certain drawbacks. First, considering a taxon only as a an instance misses the fact that it is a set of biological individuals with common characteristics. An OWL class exactly captures this semantics through the set of necessary and sufficient conditions that an individual must meet to be a class member. In turn, an OWL reasoner can leverage this knowledge to perform query answering, compute subsumption or instance relationships. By contrast, taxa depicted by class instances are not defined but described by stating their properties. Hence the second drawback: unless we develop bespoke reasoners, there is not much a standard OWL reasoner can deduce from instances. Yet, some works have demonstrated the effectiveness of logic representation and reasoning capabilities, e.g. computing the alignments of two primate classifications (Franz et al. 2016) using generic reasoners that nevertheless require proprietary input formats. OWL reasoners are typically designed to solve such classification problems. They may leverage taxonomic ontologies to compute alignments with other ontologies or apply reasoning to individuals' properties to infer their species. Hence, pragmatically following the instance-based approach may indeed maximize interlinking in the short term, but bears the risk of denying ourselves potentially desirable use cases in the longer term. We believe that developing class-based ontologies for biodiversity should help leverage the SW’s extensive theoretical and practical works to tackle a variety of use cases that so far have been addressed with bespoke solutions.