Top-k coupled keyword recommendation for relational keyword queries

Providing top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated.

[1]  Nick Koudas,et al.  Measure-driven Keyword-Query Expansion , 2009, Proc. VLDB Endow..

[2]  Rémi Gilleron,et al.  Retrieving meaningful relaxed tightest fragments for XML keyword search , 2009, EDBT '09.

[3]  Danushka Bollegala,et al.  Measuring semantic similarity between words using web search engines , 2007, WWW '07.

[4]  Longbing Cao,et al.  Coupled clustering ensemble: Incorporating coupling relationships both between base clusterings and objects , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[5]  Jian-Yun Nie,et al.  Integrating word relationships into language models , 2005, SIGIR '05.

[6]  Longbing Cao,et al.  Coupled nominal similarity in unsupervised learning , 2011, CIKM '11.

[7]  Philip S. Yu,et al.  Coupled Behavior Analysis with Applications , 2012, IEEE Transactions on Knowledge and Data Engineering.

[8]  Francesco Bonchi,et al.  Query suggestions using query-flow graphs , 2009, WSCD '09.

[9]  Stephan R. Sain,et al.  Multi-dimensional Density Estimation , 2004 .

[10]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[11]  Tao Li,et al.  Addressing diverse user preferences in SQL-query-result navigation , 2007, SIGMOD '07.

[12]  Junjie Yao,et al.  Keyword Query Reformulation on Structured Data , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[13]  Jian Pei,et al.  Top-k typicality queries and efficient query answering methods on large databases , 2009, The VLDB Journal.

[14]  Tok Wang Ling,et al.  XReal: an interactive XML keyword searching , 2010, CIKM '10.

[15]  Sandeep Tata,et al.  SQAK: doing more with keywords , 2008, SIGMOD Conference.

[16]  Jianmin Wang,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2011, IEEE Trans. Knowl. Data Eng..

[17]  Carlotta Domeniconi,et al.  Text Clustering with Local Semantic Kernels , 2008 .

[18]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[19]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[20]  P. C. Wong,et al.  Generalized vector spaces model in information retrieval , 1985, SIGIR '85.

[21]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.

[22]  Ian H. Witten,et al.  Clustering Documents Using a Wikipedia-Based Concept Representation , 2009, PAKDD.

[23]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[24]  Longbing Cao,et al.  Coupled term-term relation analysis for document clustering , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[25]  S. Sudarshan,et al.  BANKS: Browsing and Keyword Searching in Relational Databases , 2002, VLDB.

[26]  Gita Reese Sukthankar,et al.  Multi-label relational neighbor classification using social context features , 2013, KDD.

[27]  Sonia Bergamaschi,et al.  Keyword search over relational databases: a metadata approach , 2011, SIGMOD '11.

[28]  Pankaj K. Agarwal,et al.  Top-k preferences in high dimensions , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[29]  Ralf Rantzau,et al.  Context-sensitive ranking , 2006, SIGMOD Conference.

[30]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[31]  Guoliang Li,et al.  Retune: Retrieving and Materializing Tuple Units for Effective Keyword Search over Relational Databases , 2008, ER.

[32]  Victor Maojo,et al.  A context vector model for information retrieval , 2002, J. Assoc. Inf. Sci. Technol..

[33]  Jianxin Li,et al.  Fast ELCA computation for keyword queries on XML data , 2010, EDBT '10.

[34]  Yiu-Kai Ng,et al.  Assisting web search using query suggestion based on word similarity measure and query modification patterns , 2014, World Wide Web.

[35]  Joan Guisado-Gámez,et al.  Understanding Graph Structure of Wikipedia for Query Expansion , 2015, GRADES@SIGMOD/PODS.

[36]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.