Automatically Generating Structured Queries in XML Keyword Search

In this paper, we present a novel method for automatically deriving structured XML queries from keyword-based queries and show how it was applied to the experimental tasks proposed for the INEX 2010 data-centric track. In our method, called StruX, users specify a schema-independent unstructured keyword-based query and it automatically generates a top-k ranking of schemaaware queries based on a target XML database. Then, one of the top ranked structured queries can be selected, automatically or by a user, to be executed by an XML query engine. The generated structured queries are XPath expressions consisting of an entity path (e.g., dblp/article) and predicates (e.g., /dblp/article[author="john" and title="xml"]). We use the concept of entity, commonly adopted in the XML keyword search literature, to define suitable root nodes for the query results. Also, StruX uses IR techniques to determine in which elements a term is more likely to occur.

[1]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[2]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[3]  Wolfgang Nejdl,et al.  SUITS: Faceted User Interface for Constructing Structured Queries from Keywords , 2009, DASFAA.

[4]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[5]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  H. V. Jagadish,et al.  NaLIX: an interactive natural language interface for querying XML , 2005, SIGMOD '05.

[7]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[8]  Xuemin Lin,et al.  SPARK: A Keyword Search Engine on Relational Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[9]  Mirella M. Moro,et al.  An Evaluation Study of Search Algorithms for XML Streams , 2010, J. Inf. Data Manag..

[10]  Denilson Barbosa,et al.  FleDEx: flexible data exchange , 2007, WIDM '07.

[11]  S. Sudarshan,et al.  BANKS: Browsing and Keyword Searching in Relational Databases , 2002, VLDB.

[12]  Jianyong Wang,et al.  Effective keyword search for valuable lcas over xml documents , 2007, CIKM '07.

[13]  W. Bruce Croft,et al.  A Probabilistic Retrieval Model for Semistructured Data , 2009, ECIR.

[14]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[15]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[16]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[17]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[18]  Mirella M. Moro,et al.  An X-ray on web-available XML schemas , 2009, SGMD.

[19]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[20]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[21]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[22]  Jianyong Wang,et al.  An effective and versatile keyword search engine on heterogenous data sources , 2008, Proc. VLDB Endow..

[23]  Chee Yong Chan,et al.  Multiway SLCA-based keyword search in XML data , 2007, WWW '07.

[24]  Edleno Silva de Moura,et al.  LABRADOR: Efficiently publishing relational databases on the web by using keyword-based query interfaces , 2007, Inf. Process. Manag..

[25]  Divesh Srivastava,et al.  Keyword proximity search in XML trees , 2006 .