A query refinement framework for xml keyword search

Existing work of XML keyword search focus on how to find relevant and meaningful data fragments for a query, assuming each keyword is intended as part of it. However, in XML keyword search, user queries usually contain irrelevant or mismatched terms, typos etc, which may easily lead to empty or meaningless results. In this paper, we introduce the problem of content-aware XML keyword query refinement, where the search engine should judiciously decide whether a user query Q needs to be refined during the processing of Q, and find a list of promising refined query candidates which guarantee to have meaningful matching results over the XML data, without any user interaction or a second try. To achieve this goal, we build a novel content-aware XML keyword query refinement framework consisting of two core parts: (1) we build a query ranking model to evaluate the quality of a refined query RQ, which captures the morphological/semantical similarity between Q and RQ and the dependency of keywords of RQ over the XML data; (2) we integrate the exploration of RQ candidates and the generation of their matching results as a single problem, which is fulfilled within a one-time scan of the related keyword inverted lists optimally. Finally, an extensive empirical study verifies the efficiency and effectiveness of our framework.

[1]  Huayu Wu,et al.  Object-Oriented XML Keyword Search , 2011, ER.

[2]  Yosi Mass,et al.  Component Ranking and Automatic Query Refinement for XML Retrieval , 2004, INEX.

[3]  Chee Yong Chan,et al.  Multiway SLCA-based keyword search in XML data , 2007, WWW '07.

[4]  Xingming Sun,et al.  Toward Efficient Multi-Keyword Fuzzy Search Over Encrypted Outsourced Data With Accuracy Improvement , 2016, IEEE Transactions on Information Forensics and Security.

[5]  Zhendong Niu,et al.  Concept Based Query Expansion , 2013, 2013 Ninth International Conference on Semantics, Knowledge and Grids.

[6]  Gabriella Kazai INitiative for the Evaluation of XML Retrieval , 2009, Encyclopedia of Database Systems.

[7]  W. Bruce Croft,et al.  Quary Expansion Using Local and Global Document Analysis , 1996, SIGIR Forum.

[8]  Peter Boros,et al.  Query Segmentation for Web Search , 2003, WWW.

[9]  Daniel C. Fain,et al.  Sponsored search: A brief history , 2006 .

[10]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[11]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[12]  Tok Wang Ling,et al.  An Effective Object-Level XML Keyword Search , 2010, DASFAA.

[13]  Xin Li,et al.  Context sensitive stemming for web search , 2007, SIGIR.

[14]  Xingming Sun,et al.  Achieving Efficient Cloud Search Services: Multi-Keyword Ranked Search over Encrypted Cloud Data Supporting Parallel Computing , 2015, IEICE Trans. Commun..

[15]  W. Bruce Croft,et al.  Refining Keyword Queries for XML Retrieval by Combining Content and Structure , 2009, ECIR.

[16]  Qian Wang,et al.  A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data , 2016, IEEE Transactions on Parallel and Distributed Systems.

[17]  Divesh Srivastava,et al.  Keyword proximity search in XML trees , 2006, IEEE Transactions on Knowledge and Data Engineering.

[18]  Jianxin Li,et al.  Suggestion of promising result types for XML keyword search , 2010, EDBT '10.

[19]  Yannis Papakonstantinou,et al.  Efficient LCA based keyword search in xml data , 2007, CIKM '07.

[20]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[21]  Jianyong Wang,et al.  Effective keyword search for valuable lcas over xml documents , 2007, CIKM '07.

[22]  Jianxin Li,et al.  ELCA evaluation for keyword search on probabilistic XML data , 2012, World Wide Web.

[23]  Ian Ruthven,et al.  Re-examining the potential effectiveness of interactive query expansion , 2003, SIGIR.

[24]  Guoliang Li,et al.  Efficient Fuzzy Type-Ahead Search in XML Data , 2012, IEEE Transactions on Knowledge and Data Engineering.

[25]  Yi Chen,et al.  Structured Search Result Differentiation , 2009, Proc. VLDB Endow..

[26]  Tok Wang Ling,et al.  A general framework to resolve the MisMatch problem in XML keyword search , 2015, The VLDB Journal.

[27]  Ron Weiss,et al.  Fast and effective query refinement , 1997, SIGIR '97.

[28]  Hang Li,et al.  A unified and discriminative model for query refinement , 2008, SIGIR '08.

[29]  Gabriella Kazai Initiative for the Evaluation of XML Retrieval , 2009 .

[30]  Marianne Winslett,et al.  Using structural information in XML keyword search effectively , 2011, TODS.

[31]  Ralf Schenkel,et al.  Query Refinement by Relevance Feedback in an XML Retrieval System , 2004, ER.

[32]  Stavros Papadopoulos,et al.  Nearest keyword search in XML documents , 2011, SIGMOD '11.

[33]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[34]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[35]  Gerhard Weikum,et al.  TopX: efficient and versatile top-k query processing for semistructured data , 2007, The VLDB Journal.

[36]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[37]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[38]  Yang Zhang,et al.  Exploring Distributional Similarity Based Models for Query Spelling Correction , 2006, ACL.

[39]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[40]  Xingming Sun,et al.  Enabling Personalized Search over Encrypted Outsourced Data with Efficiency Improvement , 2016, IEEE Transactions on Parallel and Distributed Systems.

[41]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[42]  Sponsored Search , 2010, Encyclopedia of Machine Learning.

[43]  Jianxin Li,et al.  XClean: Providing valid spelling suggestions for XML keyword queries , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[44]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[45]  Jianxin Li,et al.  Top-k keyword search over probabilistic XML data , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[46]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[47]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[48]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[49]  Yi Chen,et al.  Reasoning and identifying relevant matches for XML keyword search , 2008, Proc. VLDB Endow..

[50]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[51]  K. Pu,et al.  Keyword query cleaning , 2008, Proc. VLDB Endow..

[52]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[53]  Tok Wang Ling,et al.  Effective XML Keyword Search with Relevance Oriented Ranking , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[54]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[55]  Rosie Jones,et al.  Query word deletion prediction , 2003, SIGIR.

[56]  Tok Wang Ling,et al.  Breaking out of the MisMatch trap , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[57]  Jianxin Li,et al.  Fast ELCA computation for keyword queries on XML data , 2010, EDBT '10.