Finding Attribute-Aware Similar Region for Data Analysis

With the proliferation of mobile devices and location-based services, increasingly massive volumes of geo-tagged data are becoming available. This data typically also contains non-location information. We study how to use such information to characterize a region and then how to find a region of the same size and with the most similar characteristics. This functionality enables a user to identify regions that share characteristics with a user-supplied region that the user is familiar with and likes. More specifically, we formalize and study a new problem called the attribute-aware similar region search (ASRS) problem. We first define so-called composite aggregators that are able to express aspects of interest in terms of the information associated with a user-supplied region. When applied to a region, an aggregator captures the region's relevant characteristics. Next, given a query region and a composite aggregator, we propose a novel algorithm called DS-Search to find the most similar region of the same size. Unlike any previous work on region search, DS-Search repeatedly discretizes and splits regions until an split region either satisfies a drop condition or it is guaranteed to not contribute to the result. In addition, we extend DS-Search to solve the ASRS problem approximately. Finally, we report on extensive empirical studies that offer insight into the efficiency and effectiveness of the paper's proposals.

[1]  Chen Li,et al.  Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) Systems , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[2]  Tanzima Hashem,et al.  Group Trip Planning Queries in Spatial Databases , 2013, SSTD.

[3]  Anthony K. H. Tung,et al.  Keyword Search in Spatial Databases: Towards Searching by Document , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[4]  Sharad Mehrotra,et al.  Progressive approximate aggregate queries with a multi-resolution tree structure , 2001, SIGMOD '01.

[5]  Tao Guo,et al.  Efficient Algorithms for Answering the m-Closest Keywords Query , 2015, SIGMOD Conference.

[6]  Yufei Tao,et al.  Approximate MaxRS in Spatial Databases , 2013, Proc. VLDB Endow..

[7]  Subhas C. Nandy,et al.  A unified algorithm for finding maximum and minimum object enclosing rectangles and cuboids , 1995 .

[8]  Torsten Suel,et al.  Text vs. space: efficient geo-search query processing , 2011, CIKM '11.

[9]  Ken C. K. Lee,et al.  IR-Tree: An Efficient Index for Geographic Document Search , 2011, IEEE Trans. Knowl. Data Eng..

[10]  Hui Xiong,et al.  Exploiting a page-level upper bound for multi-type nearest neighbor queries , 2006, GIS '06.

[11]  Chunyan Miao,et al.  Towards Best Region Search for Data Exploration , 2016, SIGMOD Conference.

[12]  Goce Trajcevski,et al.  Class-based Conditional MaxRS Query in Spatial Data Streams , 2017, SSDBM.

[13]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[14]  Beng Chin Ooi,et al.  Efficient Processing of Spatial Group Keyword Queries , 2015, TODS.

[15]  Gao Cong,et al.  Querying and mining geo-textual data for exploration: Challenges and opportunities , 2016, 2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW).

[16]  Cyrus Shahabi,et al.  The optimal sequenced route query , 2008, The VLDB Journal.

[17]  Naphtali Rishe,et al.  Efficient and Scalable Method for Processing Top-k Spatial Boolean Queries , 2010, SSDBM.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Yufei Tao,et al.  A Scalable Algorithm for Maximizing Range Sum in Spatial Databases , 2012, Proc. VLDB Endow..

[20]  Tao Guo,et al.  SURGE: Continuous Detection of Bursty Regions Over a Stream of Spatial Objects , 2017, IEEE Transactions on Knowledge and Data Engineering.

[21]  Jian Pei,et al.  Finding the minimum spatial keyword cover , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[22]  Hans-Joachim Lenz,et al.  The R/sub a/*-tree: an improved R*-tree with materialized data for supporting range queries on OLAP-data , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[23]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[24]  Panos Kalnis,et al.  Efficient OLAP Operations in Spatial Data Warehouses , 2001, SSTD.

[25]  Chin-Wan Chung,et al.  Indexing range sum queries in spatio-temporal databases , 2007, Inf. Softw. Technol..

[26]  Feifei Li,et al.  On Trip Planning Queries in Spatial Databases , 2005, SSTD.