ICRA: Effective Semantics for Ranked XML Keyword Search

Keyword search is a user-friendly way to query XML databases. Most previous efiorts in this area focus on keyword proximity search in XML based on either tree data model or graph (or digraph) data model. Tree data model for XML is generally simple and e‐cient for keyword proximity search. However, it cannot capture connections such as ID references in XML databases. In the contrast, techniques based on graph (or digraph) data model capture connections, but are generally ine‐cient to compute. In this paper, we propose interconnected object trees model for keyword search to achieve the e‐ciency of tree model and meanwhile to capture the connections such as ID references in XML by fully exploiting the property and schema information of XML databases. In particular, we propose ICA (Interested Common Ancestor) semantics to flnd all predeflned interested objects that contain all query keywords. We also introduce novel IRA (Interested Related Ancestors) semantics to capture the conceptual connections between interested objects and include more objects that only contain some query keywords. Then, a novel ranking metric, RelevanceRank, is studied to dynamically assign higher ranks to objects that are more relevant to a given keyword query according to the conceptual connections in IRAs. We design and analyze e‐cient algorithms for keyword search based on our data model; and experiment results show our approach is e‐cient and outperforms most existing systems in terms of result quality. A prototype of our ICRA system (ICRA = ICA + IRA) on the updated 321M DBLP data is available at http://xmldb.ddns.comp.nus.edu.sg/.

[1]  Chee Yong Chan,et al.  Multiway SLCA-based keyword search in XML data , 2007, WWW '07.

[2]  Yehoshua Sagiv,et al.  Interconnection semantics for keyword search in XML , 2005, CIKM '05.

[3]  Divesh Srivastava,et al.  Keyword proximity search in XML trees , 2006 .

[4]  Divyakant Agrawal,et al.  Retrieving and organizing web pages by “information unit” , 2001, WWW '01.

[5]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[6]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[7]  Menzo Windhouwer,et al.  Querying XML documents made easy: nearest concept queries , 2001, Proceedings 17th International Conference on Data Engineering.

[8]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[10]  Yehoshua Sagiv,et al.  Efficiently Enumerating Results of Keyword Search , 2005, DBPL.

[11]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[12]  Jiaheng Lu,et al.  Effective Keyword Search in XML Documents Based on MIU , 2006, DASFAA.

[13]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[14]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[15]  Lin Guo XRANK : Ranked Keyword Search over XML Documents , 2003 .

[16]  Sudipto Guha,et al.  Approximation algorithms for directed Steiner problems , 1999, SODA '98.

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .