Processing keyword search on XML: a survey

Keyword search is a user-friendly approach for users to retrieve information from XML data. Since an XML document can have a large size and contain a lot of information, an XML keyword search result should be a fragment of an XML document dynamically constructed at query time, which is achievable due to the structuredness of XML. Processing keyword searches on XML has several challenges, e.g., what are the elements in the XML document that are relevant to the query? How to generate the results efficiently and rank the results meaningfully? How to present the results to the user in a way such that the user can quickly find the desired information? In this survey, we review the papers in the literature that attempted to address these problems. We divide the existing approaches into several classes based on the problem they tackled, and perform a comprehensive analysis of these works.

[1]  Ralf Schenkel,et al.  Structural Feedback for Keyword-Based XML Retrieval , 2006, ECIR.

[2]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[3]  Menzo Windhouwer,et al.  Efficient Relational Storage and Retrieval of XML Documents , 2000, WebDB.

[4]  Mong-Li Lee,et al.  A Prime Number Labeling Scheme for Dynamic Ordered XML Trees , 2004, ICDE.

[5]  Sebastian Rudolph,et al.  Ontology-Based Interpretation of Keywords for Semantic Search , 2007, ISWC/ASWC.

[6]  Yehoshua Sagiv,et al.  Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.

[7]  Ziyang Liu,et al.  Query biased snippet generation in XML search , 2008, SIGMOD Conference.

[8]  Haofen Wang,et al.  Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[9]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[10]  Chee Yong Chan,et al.  Multiway SLCA-based keyword search in XML data , 2007, WWW '07.

[11]  Yi Chen,et al.  Reasoning and identifying relevant matches for XML keyword search , 2008, Proc. VLDB Endow..

[12]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Hongjun Lu,et al.  Dynamically Updating XML Data: Numbering Scheme Revisited , 2004, World Wide Web.

[14]  Yannis Papakonstantinou,et al.  Supporting top-K keyword search in XML databases , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[15]  Yichuan Cai,et al.  TargetSearch: A ranking friendly XML keyword search engine , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[16]  Ziyang Liu,et al.  Return specification inference and result clustering for keyword search on XML , 2010, TODS.

[17]  Tok Wang Ling,et al.  DDE: from dewey to a fully dynamic XML labeling scheme , 2009, SIGMOD Conference.

[18]  Mounia Lalmas,et al.  Specificity aboutness in XML retrieval , 2010, Information Retrieval.

[19]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[20]  Mounia Lalmas,et al.  Using Topic Shifts for Focussed Access to XML Repositories , 2007, ECIR.

[21]  Ioana Manolescu,et al.  Agora: Living with XML and Relational , 2000, VLDB.

[22]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[23]  Tok Wang Ling,et al.  Effective XML Keyword Search with Relevance Oriented Ranking , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[24]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[25]  Yehoshua Sagiv,et al.  Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[26]  Menzo Windhouwer,et al.  Querying XML documents made easy: nearest concept queries , 2001, Proceedings 17th International Conference on Data Engineering.

[27]  Hongjun Lu,et al.  DVQ: Towards Visual Query Processing of XML Database Systems , 2003, World Wide Web.

[28]  Chong Wang,et al.  SPARK: Adapting Keyword Query to Semantic Search , 2007, ISWC/ASWC.

[29]  Jianyong Wang,et al.  Effective keyword search for valuable lcas over xml documents , 2007, CIKM '07.

[30]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[31]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[32]  Lin Guo XRANK : Ranked Keyword Search over XML Documents , 2003 .

[33]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[34]  S. Sudarshan,et al.  Keyword search on external memory data graphs , 2008, Proc. VLDB Endow..

[35]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[36]  Gabriella Kazai,et al.  eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval , 2006, TOIS.

[37]  Gabriella Kazai,et al.  Evaluating the effectiveness of content-oriented XML retrieval , 2003 .

[38]  Divesh Srivastava,et al.  Keyword proximity search in XML trees , 2006 .

[39]  Yi Chen,et al.  XSeek: A Semantic XML Search Engine Using Keywords , 2007, VLDB.

[40]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[41]  Gabriella Kazai INitiative for the Evaluation of XML Retrieval , 2009, Encyclopedia of Database Systems.

[42]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.

[43]  Alessandro Campi,et al.  XQBE: A Graphical Environment to Query XML Data , 2005, World Wide Web.

[44]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[45]  Sriram Padmanabhan,et al.  L-Tree: A Dynamic Labeling Structure for Ordered XML Data , 2004, EDBT Workshops.

[46]  Jianxin Li,et al.  Suggestion of promising result types for XML keyword search , 2010, EDBT '10.

[47]  Hai Jin,et al.  Practical and effective IR-style keyword search over semantic web , 2009, Inf. Process. Manag..

[48]  Yi Chen,et al.  eXtract: a snippet generation system for XML search , 2008, Proc. VLDB Endow..

[49]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[50]  Georgia Koutrika,et al.  Précis: The Essence of a Query Answer , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[51]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[52]  Pierre Hansen,et al.  An Impossibility Result in Axiomatic Location Theory , 1996, Math. Oper. Res..

[53]  Eric Horvitz,et al.  Social Choice Theory and Recommender Systems: Analysis of the Axiomatic Foundations of Collaborative Filtering , 2000, AAAI/IAAI.

[54]  Jianxin Li,et al.  Fast ELCA computation for keyword queries on XML data , 2010, EDBT '10.

[55]  Yi Chen,et al.  Improving XML search by generating and utilizing informative result snippets , 2010, TODS.

[56]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[57]  Yi Chen,et al.  Answering Keyword Queries on XML Using Materialized Views , 2008, 2008 IEEE 24th International Conference on Data Engineering.