Semantic-aware Query Processing for Activity Trajectories

Nowadays, users of social networks like tweets and weibo have generated massive geo-tagged records, and these records reveal their activities in the physical world together with spatio-temporal dynamics. Existing trajectory data management studies mainly focus on analyzing the spatio-temporal properties of trajectories, while leaving the understanding of their activities largely untouched. In this paper, we incorporate the semantic analysis of the activity information embedded in trajectories into query modelling and processing, with the aim of providing end users more accurate and meaningful trip recommendations. To this end, we propose a novel trajectory query that not only considers the spatio-temporal closeness but also, more importantly, leverages probabilistic topic modelling to capture the semantic relevance of the activities between data and query. To support efficient query processing, we design a novel hybrid index structure, namely ST-tree, to organize the trajectory points hierarchically, which enables us to prune the search space in spatial and topic dimensions simultaneously. The experimental results on real datasets demonstrate the efficiency and scalability of the proposed index structure and search algorithms.

[1]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[2]  Xing Xie,et al.  Learning Location Correlation from GPS Trajectories , 2010, 2010 Eleventh International Conference on Mobile Data Management.

[3]  Hui Xiong,et al.  Personalized Travel Package Recommendation , 2011, 2011 IEEE 11th International Conference on Data Mining.

[4]  Nicholas Jing Yuan,et al.  Making sense of trajectory data: A partition-and-summarization approach , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[5]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[6]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[7]  Jiajie Xu,et al.  Interactive Top-k Spatial Keyword queries , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[8]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[9]  Christian S. Jensen,et al.  Mining significant semantic locations from GPS data , 2010, Proc. VLDB Endow..

[10]  Panos Kalnis,et al.  User oriented trajectory search for trip recommendation , 2012, EDBT '12.

[11]  Beng Chin Ooi,et al.  Collective spatial keyword querying , 2011, SIGMOD '11.

[12]  Jianliang Xu,et al.  Authenticating Top-k Queries in Location-based Services with Confidentiality , 2013, Proc. VLDB Endow..

[13]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[14]  Jiajie Xu,et al.  On personalized and sequenced route planning , 2015, World Wide Web.

[15]  Padhraic Smyth,et al.  Hierarchical Dirichlet Processes with Random Effects , 2006, NIPS.

[16]  Nicholas Jing Yuan,et al.  Towards efficient search for activity trajectories , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[17]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[18]  V. Zolotarev One-dimensional stable distributions , 1986 .

[19]  Wei Chen,et al.  Trip Oriented Search on Activity Trajectory , 2015, Journal of Computer Science and Technology.

[20]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[21]  Shazia Wasim Sadiq,et al.  SharkDB: An In-Memory Column-Oriented Trajectory Storage , 2014, CIKM.

[22]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[23]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[24]  Jiajie Xu,et al.  SeaCloudDM: a database cluster framework for managing and querying massive heterogeneous sensor sampling data , 2013, The Journal of Supercomputing.

[25]  Nicholas Jing Yuan,et al.  Approximate keyword search in semantic trajectory database , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[26]  Shuigeng Zhou,et al.  Distributed Spatial Keyword Querying on Road Networks , 2014, EDBT.

[27]  Vania Bogorny,et al.  ST‐DMQL: A Semantic Trajectory Data Mining Query Language , 2009, Int. J. Geogr. Inf. Sci..

[28]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[29]  Heng Tao Shen,et al.  Searching trajectories by locations: an efficiency study , 2010, SIGMOD Conference.

[30]  Wei Sun,et al.  On Efficient Spatial Keyword Querying with Semantics , 2016, DASFAA.

[31]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[32]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[34]  Yannis Theodoridis,et al.  Index-based Most Similar Trajectory Search , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[35]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[36]  Beng Chin Ooi,et al.  Indexing the Distance: An Efficient Method to KNN Processing , 2001, VLDB.

[37]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[38]  Nicholas Jing Yuan,et al.  Online Discovery of Gathering Patterns over Trajectories , 2014, IEEE Transactions on Knowledge and Data Engineering.