Curating and Enabling Discovery of New York State Climate Change Science Content using Semantic Web Technologies

The New York Climate Change Science Clearinghouse (NYCCSC) project arose in response to the need for a central portal for policy makers to access vetted data and information relevant to climate change adaptation and mitigation in New York State. This project involves multiple components including a publicly available website encompassing a software application that utilizes semantic web technologies to represent and categorize diverse types of data and information in a manner optimized for search. This application extends the VIVO open ontology and software suite (http://vivoweb.org), leveraging VIVO’s ability to represent people, projects, and organizations and adding the capability for representing climate change science concepts and specific relationships between those concepts and data, maps, and documents. 1 NYCCSC Project: Background and Extension of VIVO The impacts of climate change on natural and built infrastructure are wide-reaching and affect multiple sectors identified in the ClimAID report [1] such as water resources and transportation. Policy makers and researchers tasked with understanding these effects in their local regions and formulating adaptation and mitigation strategies must utilize information from multiple, diverse and scattered sources and, therefore, may draw different conclusions based on the information they encounter. The NYCCSC project emerged as a means to address these needs and aims to provide a web-based portal enabling access to data and information relevant to climate change adaptation and mitigation across New York State. Enabling a central source of regional information will support policy makers and researchers in assessing and addressing the effects of climate change. The curation process for determining what content is referenced in the clearinghouse is headed by a subject matter expert and will integrate feedback from sector experts. The NYCCSC site is scheduled to go into production by early 2016. For this project, we extended the VIVO application and ontology [2, 3] to support ongoing curation and to represent the rich semantic relationships between the different types of content accessible through the clearinghouse. As part of this extension, we incorporated a new NYCCSC ontology representing climate change concepts and their relationships with the curated content. Situating this work within the larger VIVO ecosystem provides opportunities for aligning this ontology development with the ontological directions of the larger VIVO/VIVO-ISF communities. The use of this semantic approach has afforded great flexibility in modeling content and has supported discovery through the front-end as well as curation using the NYCCSC VIVO instance. This semantic infrastructure provides the flexibility and robustness needed to possibly expand the clearinghouse beyond New York to encompass additional states. With members of the larger VIVO community exploring methods for utilizing linked data to enrich VIVO content or to link multiple VIVO instances, the NYCCSC VIVO instance can become one of many similar regional instances that can share and expose information through a combination of linked data requests and through SPARQL query population of search indices. 1.1 Ontology Design: Extending VIVO and adding climate change concepts Fig. 1. Overview of subset of NYCCSC ontology and underlying VIVO ontology. The prefixes correspond to the BIBO [4], DCAT [5], Event [6], FOAF [7], OBO [8], PROV-O [9], and SKOS [10] ontologies. VIVO is an open-source semantic application that supports representing academic research communities. VIVO extends the ontology instance editor and display application called Vitro [2, 3]. VIVO provides functionality for defining, displaying, and searching for researcher profiles and utilizes the VIVO-ISF ontology for describing information about researchers and their associated publications, grants, and organizations. More than 100 institutions and agencies across more than 30 countries are implementing VIVO or producing VIVO-compatible data. Extending VIVO allows us to leverage the linked data functionality in VIVO and to enable the information in the clearinghouse to be accessible through linked data to other linked data applications. We constructed the NYCCSC ontology to represent climate change concepts and to support both curation and discoverability of content related to these concepts. These semantic relationships are central to both curation and search and discovery, forming the core of the curation process of the content that will be linked or made accessible through the clearinghouse site as well as informing the faceting and search in the Solr search index driving the clearinghouse site search. Figure 1 shows a subset of the NYCCSC ontology and a part of the related underlying VIVO ontology. All nodes in the image are within the NYCCSC ontology namespace except for those with a prefix preceding the class name. 2 Search and Discovery Example Fig. 2. Screenshot excerpt showing results for “heavy precipitation events” filtered by one of the Effects facet options and further refined by zooming into the map near the Syracuse region. The NYCCSC VIVO extension added multiple queries to populate the Solr search index facets used on the front end. Figure 3, which represents individual resources using rectangular shapes and classes using ovals, shows the semantic relationships which populate the facet values leading to the “Save the Rain” website result in the clearinghouse search displayed in Figure 2. The “Effect addressed by” relationship is used to populate the Effect facet, enabling the selection of a particular Effect facet value in the interface to filter search results to those that utilize this semantic relationship. Fig. 3. Relationships supporting discovery of resources related to heavy precipitation events