Finding Missing Answers due to Object Duplication in XML Keyword Search

XML documents often have duplicated objects, with a view to maintaining tree structure. Once object duplication occurs, two nodes may have the same object as the child. However, this child object is not discovered by the typical LCA (Lowest Common Ancestor) based approaches in XML keyword search. This may lead to the problem of missing answers in those approaches. To solve this problem, we propose a new approach, in which we model an XML document as a so-called XML IDREF graph so that all instances of the same object are linked. Thereby, the missing answers can be found by following these links. Moreover, to improve the efficiency of the search over XML IDREF graph, we exploit the hierarchical structure of the XML IDREF graph so that we can generalize the efficient techniques of the LCA-based approaches for searching over XML IDREF graph. The experimental results show that our approach outperforms the existing approaches in term of both effectiveness and efficiency.

[1]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[3]  Jianxin Li,et al.  Fast ELCA computation for keyword queries on XML data , 2010, EDBT '10.

[4]  Stéphane Bressan,et al.  Discovering Semantics from Data-Centric XML , 2013, DEXA.

[5]  Aijun An,et al.  Keyword Search in Graphs: Finding r-cliques , 2011, Proc. VLDB Endow..

[6]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[7]  Yi Chen,et al.  Reasoning and identifying relevant matches for XML keyword search , 2008, Proc. VLDB Endow..

[8]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[9]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[10]  Marianne Winslett,et al.  EXTRUCT: Using Deep Structural Information in XML Keyword Search , 2010, Proc. VLDB Endow..

[11]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[12]  Stavros Papadopoulos,et al.  Nearest keyword search in XML documents , 2011, SIGMOD '11.

[13]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[14]  Jianyong Wang,et al.  Effective keyword search for valuable lcas over xml documents , 2007, CIKM '07.

[15]  Joseph Fong,et al.  Converting relational database into XML documents with DOM , 2003, Inf. Softw. Technol..

[16]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[17]  S. E. Dreyfus,et al.  The steiner problem in graphs , 1971, Networks.

[18]  Tok Wang Ling,et al.  Object Semantics for XML Keyword Search , 2014, DASFAA.

[19]  Xudong Lin,et al.  Fast SLCA and ELCA Computation for XML Keyword Queries Based on Set Intersection , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[20]  Tok Wang Ling,et al.  From Structure-Based to Semantics-Based: Towards Effective XML Keyword Search , 2013, ER.