A general framework to resolve the MisMatch problem in XML keyword search

When users issue a query to a database, they have expectations about the results. If what they search for is unavailable in the database, the system will return an empty result or, worse, erroneous mismatch results. We call this problem the MisMatch problem. In this paper, we solve the MisMatch problem in the context of XML keyword search. Our solution is based on two novel concepts that we introduce: target node type and Distinguishability. Target Node Type represents the type of node a query result intends to match, and Distinguishability is used to measure the importance of the query keywords. Using these concepts, we develop a low-cost post-processing algorithm on the results of query evaluation to detect the MisMatch problem and generate helpful suggestions to users. Our approach has three noteworthy features: (1) for queries with the MisMatch problem, it generates the explanation, suggested queries and their sample results as the output to users, helping users judge whether the MisMatch problem is solved without reading all query results; (2) it is portable as it can work with any lowest common ancestor-based matching semantics (for XML data without ID references) or minimal Steiner tree-based matching semantics (for XML data with ID references) which return tree structures as results. It is orthogonal to the choice of result retrieval method adopted; (3) it is lightweight in the way that it occupies a very small proportion of the whole query evaluation time. Extensive experiments on three real datasets verify the effectiveness, efficiency and scalability of our approach. A search engine called XClear has been built and is available at http://xclear.comp.nus.edu.sg.

[1]  Tok Wang Ling,et al.  Breaking out of the MisMatch trap , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[2]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[4]  Gerhard Weikum,et al.  STAR: Steiner-Tree Approximation in Relationship Graphs , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[5]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[6]  Ki-Hoon Lee,et al.  Structural consistency: enabling XML keyword search to eliminate spurious results consistently , 2009, The VLDB Journal.

[7]  Tok Wang Ling,et al.  Efficient XML Keyword Search: From Graph Model to Tree Model , 2013, DEXA.

[8]  Owen Kaser,et al.  Sorting improves word-aligned bitmap indexes , 2010, Data Knowl. Eng..

[9]  Guoliang Li,et al.  SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents , 2009, Inf. Sci..

[10]  Ion Muslea,et al.  Online Query Relaxation via Bayesian Causal Structures Discovery , 2005, AAAI.

[11]  Evaggelia Pitoura,et al.  YmalDB: exploring relational databases via result-driven recommendations , 2013, The VLDB Journal.

[12]  Yi Chen,et al.  Structured Search Result Differentiation , 2009, Proc. VLDB Endow..

[13]  Yi Chen,et al.  Reasoning and identifying relevant matches for XML keyword search , 2008, Proc. VLDB Endow..

[14]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[15]  Xiaofei He,et al.  Query rewriting using active learning for sponsored search , 2007, SIGIR.

[16]  Jianyong Wang,et al.  Effective keyword search for valuable lcas over xml documents , 2007, CIKM '07.

[17]  Stavros Papadopoulos,et al.  Nearest keyword search in XML documents , 2011, SIGMOD '11.

[18]  Tok Wang Ling,et al.  Removing the mismatch headache in XML keyword search , 2013, SIGIR.

[19]  Marianne Winslett,et al.  Using structural information in XML keyword search effectively , 2011, TODS.

[20]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[21]  Tok Wang Ling,et al.  An Effective Object-Level XML Keyword Search , 2010, DASFAA.

[22]  Divesh Srivastava,et al.  Fast Indexes and Algorithms for Set Similarity Selection Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  Divesh Srivastava,et al.  Keyword proximity search in XML trees , 2006, IEEE Transactions on Knowledge and Data Engineering.

[24]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[25]  Tok Wang Ling,et al.  Towards an Effective XML Keyword Search , 2010, IEEE Transactions on Knowledge and Data Engineering.

[26]  Jeffrey F. Naughton,et al.  On the provenance of non-answers to queries over extracted data , 2008, Proc. VLDB Endow..

[27]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[28]  S. E. Dreyfus,et al.  The steiner problem in graphs , 1971, Networks.

[29]  Chee Yong Chan,et al.  Multiway SLCA-based keyword search in XML data , 2007, WWW '07.

[30]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[31]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[32]  Ion Muslea,et al.  Machine learning for online query relaxation , 2004, KDD.

[33]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[34]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[35]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[36]  Tok Wang Ling,et al.  Effective XML Keyword Search with Relevance Oriented Ranking , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[37]  Alfred C. Weaver,et al.  Ieee Transactions on Knowledge and Data Engineering 1 an Empirical Performance Evaluation of Relational Keyword Search Techniques , 2022 .

[38]  Menzo Windhouwer,et al.  Querying XML documents made easy: nearest concept queries , 2001, Proceedings 17th International Conference on Data Engineering.

[39]  Subbarao Kambhampati,et al.  Answering Imprecise Queries over Autonomous Web Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).