Spatial Keyword Querying: Ranking Evaluation and Efficient Query Processing

Due to the widespread adoption of mobile devices with positioning capabilities, notably smartphones, users increasingly search for geographically nearby information on search engines. Further, an analysis of user behavior finds that users not only search for local content, but also take action with respect to search results. In step with these developments, the research community has proposed various kinds of spatial keyword queries that return ranked lists of relevant points of interest. These proposals generally come with advanced query processing techniques, the goal being to make it possible for users to find relevant information quickly. Most of the proposals employ a simple ranking function that takes only textual relevance and spatial proximity into account. While these proposals study the query processing efficiency, they are generally weak when it comes to evaluation of the result rankings. We believe that ranking evaluation for spatial keyword queries is important since it is directly related to the user satisfaction. The thesis addresses several challenges related to ranking evaluation for spatial keyword queries. The first challenge we address is forming groundtruth rankings for spatial keyword queries that reflect user preferences. The main idea is that the more similar an output ranking is to the ground-truth ranking, the better the output ranking is. The thesis proposes methods based on crowdsourcing and vehicle trajectories to address this challenge. These methods make it possible for researchers to propose novel ranking functions and to assess the performance of these functions. As such, the thesis makes a step towards more advanced and complex ranking functions that correspond better to user preferences. The contributions of the thesis can also be used to evaluate hypotheses regarding different keywords and geographical regions. Along these lines, it might be possible to employ different ranking functions for different queries in the same system. The thesis also addresses the problem of detecting the visited points of interest in a GPS dataset and proposes algorithms to tackle this problem. These visits offer insight into which points of interest are of interest to drivers and offer a means of ranking for points of interest. More specifically, the thesis first proposes a technique based on crowd-

[1]  Neoklis Polyzotis,et al.  Max algorithms in crowdsourcing environments , 2012, WWW.

[2]  Aditya G. Parameswaran,et al.  Evaluating the crowd with confidence , 2013, KDD.

[3]  Gang Wang,et al.  Inferring Venue Visits from GPS Trajectories , 2017, SIGSPATIAL/GIS.

[4]  Ken C. K. Lee,et al.  IR-Tree: An Efficient Index for Geographic Document Search , 2011, IEEE Trans. Knowl. Data Eng..

[5]  Christian S. Jensen,et al.  Efficient continuously moving top-k spatial keyword query processing , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[6]  Anthony K. H. Tung,et al.  Scalable top-k spatial keyword search , 2013, EDBT '13.

[7]  Christian S. Jensen,et al.  Retrieving top-k prestige-based relevant spatial web objects , 2010, Proc. VLDB Endow..

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[10]  Cheng Long,et al.  Inherent-Cost Aware Collective Spatial Keyword Queries , 2017, SSTD.

[11]  Wei Wu,et al.  iSEE: Efficient Continuous K-Nearest-Neighbor Monitoring over Moving Objects , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[12]  Sergei Vassilvitskii,et al.  Driven by Food: Modeling Geographic Choice , 2015, WSDM.

[13]  Kian-Lee Tan,et al.  Processing spatial keyword query as a top-k aggregation query , 2014, SIGIR.

[14]  Ronald Fagin,et al.  Comparing Partial Rankings , 2006, SIAM J. Discret. Math..

[15]  Christian S. Jensen,et al.  A Density-Based Approach to the Retrieval of Top-K Spatial Textual Clusters , 2016, CIKM.

[16]  James Bailey,et al.  Extracting significant places from mobile user GPS trajectories: a bearing change based approach , 2012, SIGSPATIAL/GIS.

[17]  Gaetano Borriello,et al.  Extracting places from traces of locations , 2004, MOCO.

[18]  Christian S. Jensen,et al.  Finding top-k relevant groups of spatial web objects , 2015, The VLDB Journal.

[19]  Anthony K. H. Tung,et al.  Locating mapped resources in Web 2.0 , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[20]  Jing Xu,et al.  DESKS: Direction-Aware Spatial Keyword Search , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[21]  Chiara Renso,et al.  Inferring human activities from GPS tracks , 2013, UrbComp '13.

[22]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[23]  Xing Xie,et al.  Mining interesting locations and travel sequences from GPS trajectories , 2009, WWW '09.

[24]  Vania Bogorny,et al.  A clustering-based approach for discovering interesting places in trajectories , 2008, SAC '08.

[25]  Fillia Makedon,et al.  Learning from Incomplete Ratings Using Non-negative Matrix Factorization , 2006, SDM.

[26]  Vania Bogorny,et al.  A model for enriching trajectories with semantic geographical information , 2007, GIS.

[27]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[28]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[29]  Christian S. Jensen,et al.  Extracting Rankings for Spatial Keyword Queries from GPS Data , 2018, LBS.

[30]  Beng Chin Ooi,et al.  Efficient Processing of Spatial Group Keyword Queries , 2015, TODS.

[31]  D. Shepard A two-dimensional interpolation function for irregularly-spaced data , 1968, ACM National Conference.

[32]  Christian S. Jensen,et al.  Synthesis of partial rankings of points of interest using crowdsourcing , 2015, GIR.

[33]  Andrew Hogue,et al.  Learning to rank for spatiotemporal search , 2013, WSDM.

[34]  Christian S. Jensen,et al.  A framework for efficient spatial web object retrieval , 2012, The VLDB Journal.

[35]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[36]  Shashi Shekhar,et al.  Discovering personal gazetteers: an interactive clustering approach , 2004, GIS '04.

[37]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[38]  Cheng Long,et al.  Collective spatial keyword queries: a distance owner-driven approach , 2013, SIGMOD '13.

[39]  Reynold Cheng,et al.  QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications , 2015, SIGMOD Conference.

[40]  Naphtali Rishe,et al.  Efficient and Scalable Method for Processing Top-k Spatial Boolean Queries , 2010, SSDBM.

[41]  Beng Chin Ooi,et al.  Collective spatial keyword querying , 2011, SIGMOD '11.

[42]  Tim Kraska,et al.  A sample-and-clean framework for fast and accurate query processing on dirty data , 2014, SIGMOD Conference.

[43]  Paul N. Bennett,et al.  Pairwise ranking aggregation in a crowdsourced setting , 2013, WSDM.

[44]  Anthony K. H. Tung,et al.  Keyword Search in Spatial Databases: Towards Searching by Document , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[45]  Yoshihiko Suhara,et al.  Probabilistic identification of visited point-of-interest for personalized automatic check-in , 2014, UbiComp.

[46]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[47]  Peer Kröger,et al.  Extracting visited points of interest from vehicle trajectories , 2017, GeoRich '17.

[48]  Jennifer Widom,et al.  Human-assisted graph search: it's okay to ask questions , 2011, Proc. VLDB Endow..

[49]  Daniel Gatica-Perez,et al.  Discovering places of interest in everyday life from smartphone data , 2011, Multimedia Tools and Applications.

[50]  Peng Liu,et al.  VDBSCAN: Varied Density Based Spatial Clustering of Applications with Noise , 2007, 2007 International Conference on Service Systems and Service Management.

[51]  Thad Starner,et al.  Using GPS to learn significant locations and predict movement across multiple users , 2003, Personal and Ubiquitous Computing.

[52]  Christian S. Jensen,et al.  Spatial Keyword Querying , 2012, ER.

[53]  João B. Rocha-Junior,et al.  Top-k spatial keyword queries on road networks , 2012, EDBT '12.

[54]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[55]  Tim Kraska,et al.  Leveraging transitive relations for crowdsourced joins , 2013, SIGMOD '13.

[56]  James Bailey,et al.  Automatically recognizing places of interest from unreliable GPS data using spatio-temporal density estimation and line intersections , 2015, Pervasive Mob. Comput..

[57]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[58]  Christian S. Jensen,et al.  Mining significant semantic locations from GPS data , 2010, Proc. VLDB Endow..

[59]  Christian S. Jensen,et al.  Moving spatial keyword queries: Formulation, methods, and analysis , 2013, TODS.

[60]  Jennifer Widom,et al.  CrowdScreen: algorithms for filtering data with humans , 2012, SIGMOD Conference.

[61]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.