A New Density-based Spatial Clustering Algorithm for Extracting Attractive Local Regions in Georeferenced Documents

Nowadays, with the increasing attention being paid to social media, a huge number of georeferenced documents, which include location information, are posted on social media sites via the Internet. People have been transmitting and collecting information through these georeferenced documents. Georeferenced documents are usually related to not only personal topics but also local topics and events. Therefore, extracting “attractive” local regions associated with local topics from georeferenced documents is one of the most important challenges in different application domains. In this paper, a novel spatial clustering algorithm, called the (ǫ, σ)-densitybased spatial clustering algorithm, for extracting “attractive” local regions in georeferenced documents is proposed. We defined a new type of spatial cluster called an (ǫ, σ)-densitybased spatial cluster. The proposed clustering algorithm can recognize not only semantically-separated but also spatiallyseparated spatial clusters. To evaluate our proposed clustering algorithm, geo-tagged tweets posted on the Twitter site are used. The experimental results show that the (ǫ, σ)-densitybased spatial clustering algorithm can extract “attractive” local regions as (ǫ, σ)-density-based spatial clusters.

[1]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[2]  Steven Schockaert,et al.  Detecting Places of Interest Using Social Media , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[3]  Keiji Yanai,et al.  Detecting cultural differences using consumer-generated geotagged photos , 2009, LOCWEB '09.

[4]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[5]  Takumi Ichimura,et al.  Density-Based Spatiotemporal Clustering Algorithm for Extracting Bursty Areas from Georeferenced Documents , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[6]  Michael F. Goodchild,et al.  Citizens as Voluntary Sensors: Spatial Data Infrastructure in the World of Web 2.0 , 2007, Int. J. Spatial Data Infrastructures Res..

[7]  Kazufumi Watanabe,et al.  Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs , 2011, CIKM '11.

[8]  Mor Naaman,et al.  Towards automatic extraction of event and place semantics from flickr tags , 2007, SIGIR.

[9]  Oscar Castillo,et al.  Proceedings of the International MultiConference of Engineers and Computer Scientists 2007, IMECS 2007, March 21-23, 2007, Hong Kong, China , 2007, IMECS.

[10]  Michael R. Lyu,et al.  Location-based topic evolution , 2011, MLBS '11.

[11]  Slava Kisilevich,et al.  P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos , 2010, COM.Geo '10.

[12]  Mor Naaman,et al.  Generating summaries and visualization for large collections of geo-referenced photographs , 2006, MIR '06.

[13]  Kazutoshi Sumiya,et al.  Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection , 2010, LBSN '10.

[14]  Mor Naaman,et al.  Geographic information from georeferenced social media data , 2011, SIGSPACIAL.

[15]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[16]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[17]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[18]  Koji Zettsu,et al.  mTrend: discovery of topic movements on geo-microblogging messages , 2011, GIS.