Summarizing Answer Graphs Induced by Keyword Queries

Keyword search has been popularly used to query graph data. Due to the lack of structure support, a keyword query might generate an excessive number of matches, referred to as "answer graphs", that could include different relationships among keywords. An ignored yet important task is to group and summarize answer graphs that share similar structures and contents for better query interpretation and result understanding. This paper studies the summarization problem for the answer graphs induced by a keyword query Q. (1) A notion of summary graph is proposed to characterize the summarization of answer graphs. Given Q and a set of answer graphs G, a summary graph preserves the relation of the keywords in Q by summarizing the paths connecting the keywords nodes in G. (2) A quality metric of summary graphs, called coverage ratio, is developed to measure information loss of summarization. (3) Based on the metric, a set of summarization problems are formulated, which aim to find minimized summary graphs with certain coverage ratio. (a) We show that the complexity of these summarization problems ranges from ptime to NP-complete. (b) We provide exact and heuristic summarization algorithms. (4) Using real-life and synthetic graphs, we experimentally verify the effectiveness and the efficiency of our techniques.

[1]  Georgia Koutrika,et al.  Data clouds: summarizing keyword search results over structured data , 2009, EDBT '09.

[2]  Yi Chen,et al.  Query Expansion Based on Clustered Results , 2011, Proc. VLDB Endow..

[3]  Ziyang Liu,et al.  Query biased snippet generation in XML search , 2008, SIGMOD Conference.

[4]  Orna Grumberg,et al.  Simulation-based minimization , 2003, TOCL.

[5]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[6]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[7]  Charu C. Aggarwal,et al.  A Survey of Clustering Algorithms for Graph Data , 2010, Managing and Mining Graph Data.

[8]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[9]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[10]  Walid G. Aref,et al.  Spatio-Temporal Access Methods: Part 2 (2003 - 2010) , 2010, IEEE Data Eng. Bull..

[11]  Ziyang Liu,et al.  Return specification inference and result clustering for keyword search on XML , 2010, TODS.

[12]  K. Parthasarathy,et al.  Algorithm for answer graph construction for keyword queries on RDF data , 2011 .

[13]  Sebastian Hellmann,et al.  Keyword-Driven SPARQL Query Generation Leveraging Background Knowledge , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[14]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[15]  K. Parthasarathy,et al.  Ranked answer graph construction for keyword queries on RDF graphs without distance neighbourhood restriction , 2011, WWW.

[16]  Yi Chen,et al.  Query Results Ready, Now What? , 2010, IEEE Data Eng. Bull..

[17]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[18]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[19]  Nick Koudas,et al.  Measure-driven Keyword-Query Expansion , 2009, Proc. VLDB Endow..

[20]  Andy Schürr,et al.  Incremental Graph Pattern Matching , 2006 .

[21]  W. Bruce Croft,et al.  Refining Keyword Queries for XML Retrieval by Combining Content and Structure , 2009, ECIR.

[22]  Marcin Sydow,et al.  Entity summarisation with limited edge budget on knowledge graphs , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.

[23]  S. Sudarshan,et al.  Enhancing Search with Structure , 2010, IEEE Data Eng. Bull..

[24]  Haofen Wang,et al.  Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[25]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[26]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[27]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[28]  Charu C. Aggarwal,et al.  A Survey of Algorithms for Keyword Search on Graph Data , 2010, Managing and Mining Graph Data.

[29]  Cong Yu,et al.  REX: Explaining Relationships between Entity Pairs , 2011, Proc. VLDB Endow..

[30]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[31]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[32]  Sandeep Tata,et al.  SQAK: doing more with keywords , 2008, SIGMOD Conference.

[33]  Xuemin Lin,et al.  Keyword search on structured and semi-structured data , 2009, SIGMOD Conference.

[34]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[35]  Kemafor Anyanwu,et al.  Effectively Interpreting Keyword Queries on RDF Databases with a Rear View , 2011, International Semantic Web Conference.

[36]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[37]  Guoliang Li,et al.  Interactive SQL query suggestion: Making databases user-friendly , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[38]  Sunita Sarawagi,et al.  Biography and Position Statement. , 2010 .

[39]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[40]  Jignesh M. Patel,et al.  Discovery-driven graph summarization , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[41]  J. Plesník Complexity of decomposing graphs into factors with given diameters or radii , 1982 .

[42]  Carla Piazza,et al.  From Bisimulation to Simulation: Coarsest Partition Problems , 2003, Journal of Automated Reasoning.