Efficient keyword search over graph-structured data based on minimal covered r-cliques

Keyword search is an alternative for structured languages in querying graph-structured data. A result to a keyword query is a connected structure covering all or part of the queried keywords. The textual coverage and structural compactness have been known as the two main properties of a relevant result to a keyword query. Many previous works examined these properties after retrieving all of the candidate results using a ranking function in a comparative manner. However, this needs a time-consuming search process, which is not appropriate for an interactive system in which the user expects results in the least possible time. This problem has been addressed in recent works by confining the shape of results to examine their coverage and compactness during the search. However, these methods still suffer from the existence of redundant nodes in the retrieved results. In this paper, we introduce the semantic of minimal covered r-clique (MCCr) for the results of a keyword query as an extended model of existing definitions. We propose some efficient algorithms to detect the MCCrs of a given query. These algorithms can retrieve a comprehensive set of non-duplicate MCCrs in response to a keyword query. In addition, these algorithms can be executed in a distributive manner, which makes them outstanding in the field of keyword search. We also propose the approximate versions of these algorithms to retrieve the top-k approximate MCCrs in a polynomial delay. It is proved that the approximate algorithms can retrieve results in two-approximation. Extensive experiments on two real-world datasets confirm the efficiency and effectiveness of the proposed algorithms.

[1]  Sonia Bergamaschi,et al.  QUEST: A Keyword Search System for Relational Data based on Semantic and Machine Learning Techniques , 2013, Proc. VLDB Endow..

[2]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[4]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[5]  Yehoshua Sagiv,et al.  Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.

[6]  Coenraad Bron,et al.  Finding all cliques of an undirected graph , 1973 .

[7]  Hassan Naderi,et al.  ExPregel: a new computational model for large‐scale graph processing , 2015, Concurr. Comput. Pract. Exp..

[8]  Chang-Sup Park,et al.  Efficient processing of keyword queries over graph databases for finding effective answers , 2015, Inf. Process. Manag..

[9]  Panos M. Pardalos,et al.  Improved initial vertex ordering for exact maximum clique search , 2016, Applied Intelligence.

[10]  Roberto De Virgilio,et al.  Cluster-Based Exploration for Effective Keyword Search over Semantic Datasets , 2009, ER.

[11]  Yuqing Wu,et al.  ROU: advanced keyword search on graph , 2013, CIKM.

[12]  Sukumar Brahma,et al.  Efficient keyword search on graphs using MapReduce , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[13]  Sonia Bergamaschi,et al.  Combining user and database perspective for solving keyword queries over relational databases , 2016, Inf. Syst..

[14]  Xiaohui Yu,et al.  Efficient Duplication Free and Minimal Keyword Search in Graphs , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[16]  Pablo San Segundo,et al.  Efficiently Enumerating all Maximal Cliques with Bit-Parallelism , 2017, Comput. Oper. Res..

[17]  Desh Ranjan,et al.  Maximal clique enumeration for large graphs on hadoop framework , 2014, PPAA '14.

[18]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[19]  Hai Jin,et al.  Practical and effective IR-style keyword search over semantic web , 2009, Inf. Process. Manag..

[20]  Ting Yu,et al.  A linear time algorithm for maximal clique enumeration in large sparse graphs , 2017, Inf. Process. Lett..

[21]  Aijun An,et al.  Finding top-$$k\, r$$kr-cliques for keyword search from graphs in polynomial delay , 2014, Knowledge and Information Systems.

[22]  Hassan Naderi,et al.  A Model-based Keyword Search Approach for Detecting Top-k Effective Answers , 2018, Comput. J..

[23]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[24]  Lei Zou,et al.  Top-k queries on RDF graphs , 2015, Inf. Sci..

[25]  Tok Wang Ling,et al.  Exploiting semantics for XML keyword search , 2015, Data Knowl. Eng..

[26]  E. Lawler A PROCEDURE FOR COMPUTING THE K BEST SOLUTIONS TO DISCRETE OPTIMIZATION PROBLEMS AND ITS APPLICATION TO THE SHORTEST PATH PROBLEM , 1972 .

[27]  Yehoshua Sagiv,et al.  Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[28]  Yehoshua Sagiv,et al.  Language models for keyword search over data graphs , 2012, WSDM '12.

[29]  Wookey Lee,et al.  Retrieving keyworded subgraphs with graph ranking score , 2012, Expert Syst. Appl..

[30]  Aijun An,et al.  Keyword Search in Graphs: Finding r-cliques , 2011, Proc. VLDB Endow..

[31]  Edleno Silva de Moura,et al.  LABRADOR: Efficiently publishing relational databases on the web by using keyword-based query interfaces , 2007, Inf. Process. Manag..

[32]  Yufei Tao,et al.  Querying Communities in Relational Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[33]  Jinli Cao,et al.  Top-K data source selection for keyword queries over multiple XML data sources , 2012, J. Inf. Sci..

[34]  Berthier A. Ribeiro-Neto,et al.  A Bayesian network approach to searching Web databases through keyword-based queries , 2004, Inf. Process. Manag..

[35]  Sang-goo Lee,et al.  Keyword search in relational databases , 2010, Knowledge and Information Systems.

[36]  Yanwei Xu,et al.  Scalable continual top-k keyword search in relational databases , 2013, Data Knowl. Eng..

[37]  Haixun Wang,et al.  Efficient Keyword Search on Uncertain Graph Data , 2013, IEEE Transactions on Knowledge and Data Engineering.