Similarity Search on Spatio-Textual Point Sets

User-generated content on the Web increasingly has a geospatial dimension, opening new opportunities and challenges in location-based services and location-based social networks for mining and analyzing user behaviors and patterns. The applications of such analysis range from recommendation systems to geo-marketing. Motivated by these needs, querying and analyzing spatio-textual data has received a lot of attention over the last years. In this paper, we address the problem of matching point sets based on the spatio-textual objects they contain. This is highly relevant for users associated with geolocated photos and tweets. We formally define this problem as a Spatio-Textual Point-Set Join query, and we introduce its top-k variant. For the efficient treatment of such queries, we extend state-of-the-art algorithms for spatio-textual joins of individual points to the case of point sets. Finally, we extensively evaluate the proposed methods using large scale, real-world datasets from Flickr and Twitter.

[1]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[2]  Anthony K. H. Tung,et al.  Keyword Search in Spatial Databases: Towards Searching by Document , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[3]  Xing Xie,et al.  Mining user similarity based on location history , 2008, GIS '08.

[4]  Marios D. Dikaiakos,et al.  Identification of key locations based on online social network activity , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[5]  Panos Kalnis,et al.  Efficient OLAP Operations in Spatial Data Warehouses , 2001, SSTD.

[6]  Mohamed F. Mokbel,et al.  Recommendations in location-based social networks: a survey , 2015, GeoInformatica.

[7]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[8]  Surajit Chaudhuri,et al.  A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Xing Xie,et al.  Hybrid index structures for location-based web search , 2005, CIKM '05.

[10]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[11]  Christian S. Jensen,et al.  Spatial Keyword Query Processing: An Experimental Evaluation , 2013, Proc. VLDB Endow..

[12]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[13]  Guoliang Li,et al.  A Prefix-Filter based Method for Spatio-Textual Similarity Join , 2014, IEEE Transactions on Knowledge and Data Engineering.

[14]  Ken C. K. Lee,et al.  IR-Tree: An Efficient Index for Geographic Document Search , 2011, IEEE Trans. Knowl. Data Eng..

[15]  Guoliang Li,et al.  Can we beat the prefix filtering?: an adaptive framework for similarity join and search , 2012, SIGMOD Conference.

[16]  Maurice Bruynooghe,et al.  A polynomial time computable metric between point sets , 2001, Acta Informatica.

[17]  Wei-Ying Ma,et al.  Recommending friends and locations based on individual location history , 2011, ACM Trans. Web.

[18]  Beng Chin Ooi,et al.  Collective spatial keyword querying , 2011, SIGMOD '11.

[19]  Dong Wang,et al.  Discovering Similar Users on Twitter , 2013 .

[20]  Jun Hu,et al.  SEAL: Spatio-Textual Similarity Search , 2012, Proc. VLDB Endow..

[21]  David Liben-Nowell,et al.  Best Friends , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[22]  Naphtali Rishe,et al.  SpSJoin: parallel spatial similarity joins , 2011, GIS.

[23]  João B. Rocha-Junior,et al.  Efficient Processing of Top-k Spatial Keyword Queries , 2011, SSTD.

[24]  Cheng Long,et al.  Collective spatial keyword queries: a distance owner-driven approach , 2013, SIGMOD '13.

[25]  A. Guttmma,et al.  R-trees: a dynamic index structure for spatial searching , 1984 .

[26]  David A. Shamma,et al.  The New Data and New Challenges in Multimedia Research , 2015, ArXiv.

[27]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near-duplicate detection , 2011, TODS.

[28]  Hanan Samet,et al.  Spatial join techniques , 2007, TODS.

[29]  Jimmy J. Lin,et al.  Partitioning strategies for spatio-textual similarity join , 2014, BigSpatial '14.

[30]  Hanan Samet,et al.  Searching web documents as location sets , 2011, GIS.

[31]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[32]  Heikki Mannila,et al.  Distance measures for point sets and their computation , 1997, Acta Informatica.

[33]  Mark Sanderson,et al.  Spatio-textual Indexing for Geographical Search on the Web , 2005, SSTD.

[34]  Hanan Samet,et al.  Similarity search on a large collection of point sets , 2011, GIS.

[35]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[36]  Guoliang Li,et al.  Star-Join: spatio-textual similarity join , 2012, CIKM '12.

[37]  Sunita Sarawagi,et al.  Efficient set joins on similarity predicates , 2004, SIGMOD '04.

[38]  Wen-Syan Li,et al.  String Similarity Joins: An Experimental Evaluation , 2014, Proc. VLDB Endow..

[39]  Aniket Kittur,et al.  Bridging the gap between physical location and online social networks , 2010, UbiComp.

[40]  Roberto J. Bayardo,et al.  Scaling up all pairs similarity search , 2007, WWW '07.

[41]  Xing Xie,et al.  Finding similar users using category-based location history , 2010, GIS '10.

[42]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[43]  Nikos Mamoulis,et al.  Spatio-textual similarity joins , 2012, Proc. VLDB Endow..

[44]  Anthony K. H. Tung,et al.  Locating mapped resources in Web 2.0 , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).