Big POI data integration with Linked Data technologies

Point of Interest (POI) data constitute the cornerstone of any application, service or product even remotely related to our physical surroundings. From navigation applications to social networks, tourism, and logistics, we use POI data to search, communicate, decide and plan our actions. POIs are semantically diverse and spatio-temporally evolving entities, having geographical, temporal and thematic relations. Currently, integrating POI data to increase their coverage, timeliness, accuracy and value is a resource-intensive and mostly manual process, with no specialized software available to address the specic challenges of this task. In this paper, we present an integrated toolkit for transforming, linking, fusing and enriching POI data, and extracting additional value from them. In particular, we demonstrate how Linked Data technologies can address the limitations, gaps and challenges of the current landscape in Big POI data integration. We have built a prototype application that enables users to dene, manage and execute scalable POI data integrationworkows built on top of state-of-the-art software for geospatial Linked Data. The application abstracts and hides away the underlying complexity, automates quality-assured integration, scales eciently for world-scale integration tasks and lowers the entry barrier for end-users. Validated against real-world POI datasets in several application domains, our system has shown great potential to address the requirements and needs of cross-sector, cross-border and cross-lingual integration of Big POI data.

[1]  Spiros Athanasiou,et al.  Exposing INSPIRE on the Semantic Web , 2015, J. Web Semant..

[2]  Rik Van de Walle,et al.  Assessing and Refining Mappings to RDF to Improve Dataset Quality , 2015, SEMWEB.

[3]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[4]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[5]  Dimitrios Skoutas,et al.  FAGI-gis: A Tool for Fusing Geospatial RDF Data , 2015, ESWC.

[6]  Jens Lehmann,et al.  Distributed Semantic Analytics Using the SANSA Stack , 2017, SEMWEB.

[7]  Axel-Cyrille Ngonga Ngomo,et al.  Dynamic Planning for Link Discovery , 2018, ESWC.

[8]  AnHai Doan,et al.  Technical Perspective:: Toward Building Entity Matching Management Systems , 2016, SGMD.

[9]  Spiros Athanasiou,et al.  TripleGeo: an ETL Tool for Transforming Geospatial Data into RDF Triples , 2014, EDBT/ICDT Workshops.

[10]  Jens Lehmann,et al.  Automating RDF Dataset Transformation and Enrichment , 2015, ESWC.

[11]  Rik Van de Walle,et al.  RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data , 2014, LDOW.

[12]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[13]  Stefan Manegold,et al.  GeoTriples: Transforming geospatial data into RDF graphs using R2RML and RML mappings , 2018, J. Web Semant..

[14]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[15]  Axel-Cyrille Ngonga Ngomo,et al.  An Efficient Approach for the Generation of Allen Relations , 2016, ECAI.

[16]  Chunyan Miao,et al.  Towards Best Region Search for Data Exploration , 2016, SIGMOD Conference.

[17]  Axel-Cyrille Ngonga Ngomo,et al.  HELIOS - Execution Optimization for Link Discovery , 2014, SEMWEB.

[18]  Renée J. Miller,et al.  Discovering Linkage Points over Web Data , 2013, Proc. VLDB Endow..

[19]  Axel-Cyrille Ngonga Ngomo,et al.  An optimization approach for load balancing in parallel link discovery , 2015, SEMANTiCS.

[20]  Axel-Cyrille Ngonga Ngomo,et al.  On the Effect of Geometries Simplification on Geo-spatial Link Discovery , 2018, SEMANTiCS.

[21]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[22]  Axel-Cyrille Ngonga Ngomo,et al.  Unsupervised Link Discovery through Knowledge Base Repair , 2014, ESWC.

[23]  Dimitris Sacharidis,et al.  Efficient progressive and diversified top-k best region search , 2018, SIGSPATIAL/GIS.

[24]  Divesh Srivastava,et al.  Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[25]  Axel-Cyrille Ngonga Ngomo,et al.  Named Entity Recognition using FOX , 2014, International Semantic Web Conference.

[26]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[27]  Axel-Cyrille Ngonga Ngomo,et al.  A systematic survey of point set distance measures for link discovery , 2017, Semantic Web.

[28]  William W. Cohen,et al.  Power Iteration Clustering , 2010, ICML.

[29]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[30]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[31]  Axel-Cyrille Ngonga Ngomo,et al.  Radon results for OAEI 2017 , 2017, OM@ISWC.

[32]  Axel-Cyrille Ngonga Ngomo,et al.  Radon - Rapid Discovery of Topological Relations , 2017, AAAI.

[33]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[34]  Jens Lehmann,et al.  Wombat - A Generalization Approach for Automatic Link Discovery , 2017, ESWC.