Best Keyword Cover Search

It is common that the objects in a spatial database (e.g., restaurants/hotels) are associated with keyword(s) to indicate their businesses/services/features. An interesting problem known as Closest Keywords search is to query objects, called keyword cover, which together cover a set of query keywords and have the minimum inter-objects distance. In recent years, we observe the increasing availability and importance of keyword rating in object evaluation for the better decision making. This motivates us to investigate a generic version of Closest Keywords search called Best Keyword Cover which considers inter-objects distance as well as the keyword rating of objects. The baseline algorithm is inspired by the methods of Closest Keywords search which is based on exhaustively combining objects from different query keywords to generate candidate keyword covers. When the number of query keywords increases, the performance of the baseline algorithm drops dramatically as a result of massive candidate keyword covers generated. To attack this drawback, this work proposes a much more scalable algorithm called keyword nearest neighbor expansion (keyword-NNE). Compared to the baseline algorithm, keyword-NNE algorithm significantly reduces the number of candidate keyword covers generated. The in-depth analysis and extensive experiments on real data sets have justified the superiority of our keyword-NNE algorithm.

[1]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[2]  Alex Thomo,et al.  Indexing Reverse Top-k Queries in Two Dimensions , 2013, DASFAA.

[3]  Dimitris Papadias,et al.  Processing and optimization of multiway spatial joins using R-trees , 1999, PODS '99.

[4]  Christian S. Jensen,et al.  Retrieving top-k prestige-based relevant spatial web objects , 2010, Proc. VLDB Endow..

[5]  Byron J. Gao,et al.  A framework for personalized and collaborative clustering of search results , 2011, CIKM '11.

[6]  Man Lung Yiu,et al.  Efficient top-k aggregation of ranked inputs , 2007, TODS.

[7]  Gerhard Weikum,et al.  Probabilistic information retrieval approach for ranking of database query results , 2006, TODS.

[8]  Chen Li,et al.  Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) Systems , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[9]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[11]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[12]  Walid G. Aref,et al.  Supporting top-kjoin queries in relational databases , 2004, The VLDB Journal.

[13]  Anthony K. H. Tung,et al.  Keyword Search in Spatial Databases: Towards Searching by Document , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[14]  Yuan-Chi Chang,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD 2000.

[15]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[16]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[17]  Nikos Mamoulis,et al.  Efficient All Top-k Computation - A Unified Solution for All Top-k, Reverse Top-k and Top-m Influential Queries , 2013, IEEE Transactions on Knowledge and Data Engineering.

[18]  Dimitris Papadias,et al.  Multiway spatial joins , 2001, ACM Trans. Database Syst..

[19]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[20]  Salvatore T. March,et al.  Design and natural science research on information technology , 1995, Decis. Support Syst..

[21]  Luis Gravano,et al.  Evaluating Top-k Selection Queries , 1999, VLDB.

[22]  Christian S. Jensen,et al.  Spatial Keyword Query Processing: An Experimental Evaluation , 2013, Proc. VLDB Endow..

[23]  Yoshiharu Ishikawa,et al.  Multi-objective Optimal Combination Queries , 2011, DEXA.

[24]  Davide Martinenghi,et al.  Cost-Aware Rank Join with Random and Sorted Access , 2012, IEEE Transactions on Knowledge and Data Engineering.

[25]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[26]  Anthony K. H. Tung,et al.  Locating mapped resources in Web 2.0 , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[27]  Rakesh Agrawal,et al.  A framework for expressing and combining preferences , 2000, SIGMOD 2000.

[28]  Yejin Choi,et al.  Using landing pages for sponsored search ad selection , 2010, WWW '10.

[29]  João B. Rocha-Junior,et al.  Efficient Processing of Top-k Spatial Keyword Queries , 2011, SSTD.

[30]  Xiaoyong Du,et al.  Optimal top-k generation of attribute combinations based on ranked lists , 2012, SIGMOD Conference.

[31]  Wolfgang Lehner,et al.  Optimizing Multiple Top-K Queries over Joins , 2005, SSDBM.

[32]  Ken C. K. Lee,et al.  IR-Tree: An Efficient Index for Geographic Document Search , 2011, IEEE Trans. Knowl. Data Eng..

[33]  Alan R. Hevner,et al.  Design Science in Information Systems Research , 2004, MIS Q..

[34]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[35]  Senjuti Basu Roy,et al.  Location-aware type ahead search on spatial databases: semantics and efficiency , 2011, SIGMOD '11.

[36]  Beng Chin Ooi,et al.  Collective spatial keyword querying , 2011, SIGMOD '11.

[37]  Wei Wu,et al.  Efficient Algorithms and Cost Models for Reverse Spatial-Keyword k-Nearest Neighbor Search , 2014, ACM Trans. Database Syst..

[38]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[39]  Dimitris Papadias,et al.  Algorithms for Querying by Spatial Structure , 1998, VLDB.

[40]  Georgia Koutrika,et al.  Data clouds: summarizing keyword search results over structured data , 2009, EDBT '09.

[41]  Arbee L. P. Chen,et al.  Finding k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document} most favorite products based on reverse top , 2013, The VLDB Journal.

[42]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[43]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[44]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[45]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[46]  Jignesh M. Patel,et al.  Evaluating skylines in the presence of equijoins , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[47]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[48]  Werner Kießling,et al.  Foundations of Preferences in Database Systems , 2002, VLDB.

[49]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[50]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[51]  Mahmut Parlar,et al.  Optimal Keyword Bids in Search-Based Advertising with Stochastic Advertisement Positions , 2012, J. Optim. Theory Appl..

[52]  Arbee L. P. Chen,et al.  Determining k-most demanding products with maximum expected number of total customers , 2013, IEEE Transactions on Knowledge and Data Engineering.

[53]  Jiaheng Lu,et al.  Reverse spatial and textual k nearest neighbor search , 2011, SIGMOD '11.