A systematic survey of point set distance measures for link discovery

Large amounts of geo-spatial information have been made available with the growth of the Web of Data. While discovering links between resources on the Web of Data has been shown to be a demanding task, discovering links between geo-spatial resources proves to be even more challenging. This is partly due to the resources being described by the means of vector geometry. Especially, discrepancies in granularity and error measurements across data sets render the selection of appropriate distance measures for geo-spatial resources difficult. In this paper, we survey existing literature for point-set measures that can be used to measure the similarity of vector geometries. We then present and evaluate the ten measures that we derived from literature. We evaluate these measures with respect to their time-efficiency and their robustness against discrepancies in measurement and in granularity. To this end, we use samples of real data sets of different granularity as input for our evaluation framework. The results obtained on three different data sets suggest that most distance approaches can be led to scale. Moreover, while some distance measures are significantly slower than other measures, distance measure based on means, surjections and sums of minimal distances are robust against the different types of discrepancies.

[1]  Erin W. Chambers,et al.  Homotopic Fréchet distance between curves or, walking your dog in the woods in polynomial time , 2010, Comput. Geom..

[2]  Martin Gaedke,et al.  Discovering and Maintaining Links on the Web of Data , 2009, SEMWEB.

[3]  Pascal Hitzler,et al.  String Similarity Metrics for Ontology Alignment , 2013, SEMWEB.

[4]  Manolis Koubarakis,et al.  Discovering Spatial and Temporal Links among RDF Data , 2016, LDOW@WWW.

[5]  Remco C. Veltkamp,et al.  Multiple Polyline to Polygon Matching , 2005, ISAAC.

[6]  Young J. Kim,et al.  Interactive Hausdorff distance computation for general polygonal models , 2009, ACM Trans. Graph..

[7]  Mikhail J. Atallah,et al.  A Linear Time Algorithm for the Hausdorff Distance Between Convex Polygons , 1983, Inf. Process. Lett..

[8]  Anne Ruas,et al.  Modelling the Overall Process of Generalisation , 2013 .

[9]  Enrico Motta,et al.  Unsupervised Learning of Link Discovery Configuration , 2012, ESWC.

[10]  Graham Oddie Distance in Logical Space , 1986 .

[11]  J.-F. Girres,et al.  FIRST, DO NO HARM: ELIMINATING SYSTEMATIC ERROR IN ANALYTICAL RESULTS OF GIS APPLICATIONS , 2013 .

[12]  Sariel Har-Peled,et al.  Approximating the Fréchet Distance for Realistic Curves in Near Linear Time , 2012, Discret. Comput. Geom..

[13]  Gershon Elber,et al.  Precise Hausdorff distance computation between polygonal meshes , 2010, Comput. Aided Geom. Des..

[14]  Celso C. Ribeiro,et al.  Computing some distance functions between polygons , 1991, Pattern Recognit..

[15]  Helmut Alt,et al.  Computing the Fréchet distance between two polygonal curves , 1995, Int. J. Comput. Geom. Appl..

[16]  Robert B McMaster,et al.  Automated Line Generalization , 1987 .

[17]  Hanan Samet,et al.  An Incremental Hausdorff Distance Calculation Algorithm , 2011, Proc. VLDB Endow..

[18]  Axel-Cyrille Ngonga Ngomo,et al.  On Link Discovery using a Hybrid Approach , 2012, Journal on Data Semantics.

[19]  Haofen Wang,et al.  Zhishi.links results for OAEI 2011 , 2011, OM.

[20]  Michael T. Goodrich,et al.  Voronoi Diagrams for Polygon-Offset Distance Functions , 1997, WADS.

[21]  Godfried T. Toussaint,et al.  Optimal algorithms for computing the minimum distance between two finite planar sets , 1983, Pattern Recognit. Lett..

[22]  Stefan Decker,et al.  Linked cancer genome atlas database , 2013, I-SEMANTICS '13.

[23]  G. Toussaint,et al.  Finding the minimum vertex distance between two disjoint convex polygons in linear time , 1985 .

[24]  Maurice Bruynooghe,et al.  A polynomial time computable metric between point sets , 2001, Acta Informatica.

[25]  Carola Wenk,et al.  Computing the Fréchet distance between folded polygons , 2015, Comput. Geom..

[26]  Barbara Kitchenham,et al.  Procedures for Performing Systematic Reviews , 2004 .

[27]  Sean Quinlan,et al.  Efficient distance computation between non-convex objects , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[28]  Özgür Ulusoy,et al.  KiMPA: A Kinematics-Based Method for Polygon Approximation , 2002, ADVIS.

[29]  Michael T. Goodrich,et al.  Voronoi diagrams for convex polygon-offset distance functions , 2001, Discret. Comput. Geom..

[30]  Leen-Kiat Soh,et al.  A dissimilarity function for clustering geospatial polygons , 2009, GIS.

[31]  Godfried T. Toussaint,et al.  Efficient Algorithms for Computing the Maximum Distance Between Two Finite Planar Sets , 1983, J. Algorithms.

[32]  Jens Lehmann,et al.  Introduction to Linked Data and Its Lifecycle on the Web , 2013, Reasoning Web.

[33]  B. R. Bowring The direct and inverse solutions for the great elliptic line on the reference ellipsoid , 1984 .

[34]  Heikki Mannila,et al.  Distance measures for point sets and their computation , 1997, Acta Informatica.

[35]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[36]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement , 2009, BMJ.

[37]  Ilkka Niiniluoto,et al.  The Logic and epistemology of scientific change , 1979 .

[38]  M. Fréchet Sur quelques points du calcul fonctionnel , 1906 .

[39]  Markus Nentwig,et al.  A survey of current Link Discovery frameworks , 2016, Semantic Web.

[40]  Jens Lehmann,et al.  Wombat - A Generalization Approach for Automatic Link Discovery , 2017, ESWC.

[41]  Heiner Stuckenschmidt,et al.  Benchmarking Matching Applications on the Semantic Web , 2011, ESWC.

[42]  Axel-Cyrille Ngonga Ngomo,et al.  Radon - Rapid Discovery of Topological Relations , 2017, AAAI.

[43]  Axel-Cyrille Ngonga Ngomo,et al.  ORCHID - Reduction-Ratio-Optimal Computation of Geo-spatial Distances for Link Discovery , 2013, SEMWEB.

[44]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near duplicate detection , 2008, WWW.

[45]  Jens Lehmann,et al.  LinkedGeoData: Adding a Spatial Dimension to the Web of Data , 2009, SEMWEB.

[46]  Axel-Cyrille Ngonga Ngomo,et al.  An Efficient Approach for the Generation of Allen Relations , 2016, ECAI.

[47]  Jon M. Kleinberg,et al.  On dynamic Voronoi diagrams and the minimum Hausdorff distance for point sets under Euclidean motion in the plane , 1992, SCG '92.

[48]  Sariel Har-Peled,et al.  Computing the Fréchet Distance between Folded Polygons , 2011, WADS.