Radon results for OAEI 2017

Datasets containing billions of geospatial resources are increasingly being represented according to the Linked Data principles. Radon is an efficient solution for the discovery of topological relations between such geospatial resources according to the DE9-IM standard. Radon uses a sparse space tiling index in combination with minimum bounding boxes to reduce the computation time of topological relations. In this paper, we present the participation of Radon in the OAEI 2017 campaign. The OAEI results show that Radon outperforms the other state of the art significantly in most of the cases. 1 Presentation of the system Radon is a time-efficient link discovery algorithm for topological relations between geospatial resources, implemented within Limes [3]. Given two sets of RDF resources S and T and a relation R, the goal of link discovery is to find the mapping M = {(s, t) ∈ S × T : R(s, t)}. Radon enables the time-efficient discovery of all topological relations that can be defined in terms of the DE-9IM standard [1]. In order to achieve time-efficiency, two optimization techniques are utilized: optimized sparse space tiling on the dataset level and Minimum Bounding Box (MBB)based filtering on the resource level. In the following, we introduce the basic concepts needed to understand Radon before we outline the aforementioned optimization techniques. More detailed explanations can be found in [5]. The Minimum Bounding Box (MBB) of a geometry g in n dimensions is the rectangular box with the smallest measure (area, volume, or hypervolume in higher dimensions) within which all points of g lie. Another term for MBB is envelope. Space tiling is a technique for indexing spatial data, where n-dimensional affine spaces are split into any number of hyperrectangles with edge lengths `i and granularity factors ∆i = (`i) where i ∈ {1, . . . , n}. These hyperrectangles can then be addressed using vectors from Nn, which allows for various optimizations. 1.1 Optimized Sparse Space Tiling The goal of the optimized sparse space tiling is to generate an index I for mapping all geometries s ∈ S , t ∈ T to sets of hyperrectangles. For the sake of clarity, the following description focuses on the two-dimensional case. As a first step, we use a heuristic to get good granularity factors for both latitude and longitude dimensions (∆φ, ∆λ). Then, we apply space tiling, in which we map a geometry g to the set of hyperrectangles over which its MBB spans. To implement this idea, we insert a reference to g into all those hyperrectangles, that are realized as entries of a HashMap. To optimize (i.e. sparsify) the generated index, we start by computing estimated total hypervolumes (eth) of the datasets S and T . We first index the dataset with the smaller eth for each resource of the other dataset. We then add only to I the subset of resources from the second dataset which shares the same hyperrectangles from the first dataset resources contained in I. Using this technique together with the HashMap implementation of the hyperrectangle index significantly reduces the size of the generated data structure and consequently also the time to traverse it. 1.2 MBB-based Filtering After the optimized sparse space tiling step described above, we traverse the generated index, visiting one hyperrectangle at a time. As a consequence of our approach, each generated hyperrectangle contains references to at least one geometry from each dataset. For each pair (s, t) of geometries, where s ∈ S and t ∈ T , we then employ a filtering step before actually triggering the potentially expensive (in cases of large geometries) computation that checks if the given relation holds. Let (g) denote the MBB of geometry g. The filtering step leverages the fact that ¬r( (s), (t)) ⇒ ¬r(s, t) holds for every relation r, where one geometry has no interior or boundary points in the exterior of the other geometry, i.e. s ⊆ t or t ⊆ s. For these relations, we can return false and skip further computations, iff the geometries MBB’s do not satisfy the relation. 2 Adaptations made for the evaluation No specific adaptations were made to the original Radon algorithm [5], we only provide a Java SystemAdapter according to the campaign guidelines3. The final Radon Java SystemAdapter source code is available online in the project website4. 3 Evaluation Results Radon has been evaluated only in the Hobbit Link Discovery Track Task 2 (Spatial). The basic idea behind this task was to measure how well the systems can identify DE9IM (Dimensionally Extended nine-Intersection Model) topological relations. The supported spatial relations were: Disjoint, Touches, Contains/Within, Covers/CoveredBy, 3 https://goo.gl/cWmZ5P 4 https://goo.gl/awkvvo Intersects, Crosses, Overlaps. The geospatial resources traces were represented in Wellknown text (WKT) format as LineStrings . Given two sets of LineString geometries S and T and a DE-9IM topological relation R, the participants were assigned the task of retrieving the mapping M = {(s, t) ∈ S×T : R(s, t)}. All the systems were tested against two datasets: (1) the sandbox dataset, with a scale of 10 instances, and (2) the mainbox dataset with a scale of 2K instances. The other participants to this task in addition to Radonwere AgreementMakerLight (AML), OntoIdea, and Silk. The systems were judged on the basis of precision, recall, F-Measure and run time. The final results are shown in Table 1 and Figures 1 and 2. Note that we are only presenting the time performance and not precision, recall and F-Measure, as all were equal to 1.0 except OntoIdea Touches and Overlaps which is equal to 0.99. From these results we can see that, while Radon performs in the middle field of the the sandbox dataset, Radon outperforms the other participants on most relations for the sandbox dataset. Notably, the optimization described in Section 1.2 speeds up the relations Equals, Contains, Within, Covers and CoveredBy significantly in comparison to the remaining relations. The differences in performance between Touches, Intersects, where AML outperforms Radon, and Overlaps cannot be explained from an implementation point of view, as these three relations share the exact optimizations. However, due to the datasets consisting exclusively of LineStrings, it is apparent that Touches and Intersects are much more likely to hold between any two geometries than Overlaps. Therefore, the benchmarks on these relations are the hardest in this task.