Rtop-k: A keyword proximity search method based on semantic and structural relaxation

Recently, keyword search has attracted a great deal of attention in an XML database. In many applications which backend data source powered by an XML database management system, keyword search because important to query XML data if the user does not know the structure or only knows the structure of XML partially. Given a keyword query, existing approaches first compute the lowest common ancestors (LCAs) or their variants of XML elements that contain the input keywords, and then identify the subtrees rooted at the LCAs as the answer. But this method doesn't satisfy the user's intention well enough. For users, information containing some keywords (not all keywords) may also be useful. In this paper, we solve this problem through applying relax structural queries during the XML keyword search procedure, and progressively to obtain the top-k answers of keyword proximity search though analyzing the semantic and structural information of the queries. We propose a transformation framework to derive the structural queries by analyzing the given keyword and the structural information of XML database. In addition, we propose a scoring method considering user's preference, and at last, we design an architecture (Rtop-k) to adaptively and efficiently identify the top-k relevant answers of a query. The performance of the technique as well as the recall and the precision were measured experimentally. These experiments indicate that our system is efficient enough and ranks quality results highly.

[1]  Laks V. S. Lakshmanan,et al.  FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.

[2]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[3]  Meng Xiang An Adaptive Query Relaxation Approach for Relational Databases Based on Semantic Similarity , 2011 .

[4]  Dimitrios Gunopulos,et al.  Anytime Measures for Top-k Algorithms , 2007, VLDB.

[5]  Bei Yu,et al.  Race: finding and ranking compact connected trees for keyword proximity search over xml documents , 2008, WWW.

[6]  Chee Yong Chan,et al.  Multiway SLCA-based keyword search in XML data , 2007, WWW '07.

[7]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Lin Guo XRANK : Ranked Keyword Search over XML Documents , 2003 .

[9]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[10]  Neoklis Polyzotis,et al.  Depth estimation for ranking query optimization , 2008, The VLDB Journal.

[11]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[12]  Weidong Yang,et al.  Schema-Aware Keyword Search over XML Streams , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[13]  Ralf Schenkel,et al.  Structural Feedback for Keyword-Based XML Retrieval , 2006, ECIR.

[14]  Sihem Amer-Yahia,et al.  Structure and Content Scoring for XML , 2005, VLDB.

[15]  Xing Wang,et al.  An Adaptive Query Relaxation Approach for Relational Databases Based on Semantic Similarity: An Adaptive Query Relaxation Approach for Relational Databases Based on Semantic Similarity , 2011 .

[16]  Tiziana Catarci,et al.  Structure-aware XML Object Identification , 2006, IEEE Data Eng. Bull..

[17]  Divesh Srivastava,et al.  Keyword proximity search in XML trees , 2006 .

[18]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..