Recent years have witnessed an explosion of geospatial data, especially in the form of Volunteered Geographic Information (VGI). As a prominent example, OpenStreetMap (OSM) creates a free editable map of the world from a large number of contributors. On the other hand, social media platforms such as Twitter or Instagram supply dynamic social feeds at population level. As much of such data is geo-tagged, there is a high potential on integrating social media with OSM to enrich OSM with semantic annotations, which will complement existing objective description oriented annotations to provide a broader range of annotations. In this paper, we propose a comprehensive framework on integrating social media data and VGI data to derive knowledge about geographical objects, specifically, top relevant annotations from tweets for objects in OSM. We first integrate geo-tagged tweets with OSM data with scalable spatial queries running on MapReduce. We propose a frequency based method for annotating boundary based geographic objects, and a probability based method for annotating point based geographic objects, with consideration of noise. We evaluate our methods using a large geo-tagged tweets corpus and representative geographic objects from OSM, which demonstrates promising results through ground-truth comparison and case studies. We are able to produce up to 80% correct names for geographical objects and discover implicitly relevant information, such as popular exhibitions of a museum, the nicknames or visitors' impression to a tourism attraction.
[1]
P. J. Green,et al.
Density Estimation for Statistics and Data Analysis
,
1987
.
[2]
Padhraic Smyth,et al.
Modeling human location data with mixtures of kernel densities
,
2014,
KDD.
[3]
L. Breiman,et al.
Variable Kernel Estimates of Multivariate Densities
,
1977
.
[4]
Kazutoshi Sumiya,et al.
Urban area characterization based on crowd behavioral lifelogs over Twitter
,
2012,
Personal and Ubiquitous Computing.
[5]
Michael Gertz,et al.
Latent geographic feature extraction from social media
,
2012,
SIGSPATIAL/GIS.
[6]
Giovanni Quattrone,et al.
There's No Such Thing as the Perfect Map: Quantifying Bias in Spatial Crowd-sourcing Datasets
,
2015,
CSCW.
[7]
Fusheng Wang,et al.
SATO: a spatial data partitioning framework for scalable query processing
,
2014,
SIGSPATIAL/GIS.
[8]
Joel H. Saltz,et al.
Demonstration of Hadoop-GIS: a spatial data warehousing system over MapReduce
,
2013,
SIGSPATIAL/GIS.
[9]
M. Goodchild.
Citizens as sensors: the world of volunteered geography
,
2007
.
[10]
Sabine Storandt,et al.
Fine-grained population estimation
,
2015,
SIGSPATIAL/GIS.
[11]
Wang-Chien Lee,et al.
Semantic Annotation of Mobility Data using Social Media
,
2015,
WWW.
[12]
Joel H. Saltz,et al.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
,
2013,
Proc. VLDB Endow..