Beyond Bag of Words: A New Model for XML Keyword Query

Keyword search is an effective paradigm for information discovery and has been introduced recently to query XML documents. Effective keyword search of XML documents needs full understanding of the keyword query. Traditional bag of-words model cannot differentiate the roles of keywords as well as the relationship between keywords, thus is not proper for XML keyword queries. In this paper, we present a novel model specially designed for XML keyword query. The model takes a very different point of view on a keyword query: a keyword query is interpreted as a composition of several query units, each representing a query condition. We believe that this viewpoint captures the semantics of the query. To get an objective measure of the relevances of results with respect to the query, we devise a scoring method based on the proposed model that caters for query semantics as well as the structural properties of XML documents. Experimental results verify the effectiveness of our methods.

[1]  Jianyong Wang,et al.  Effective keyword search for valuable lcas over xml documents , 2007, CIKM '07.

[2]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[3]  Tok Wang Ling,et al.  Towards an Effective XML Keyword Search , 2010, IEEE Transactions on Knowledge and Data Engineering.

[4]  Lin Guo XRANK : Ranked Keyword Search over XML Documents , 2003 .

[5]  H. V. Jagadish,et al.  NaLIX: A generic natural language search environment for XML data , 2007, TODS.

[6]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[7]  Xuemin Lin,et al.  Keyword search on structured and semi-structured data , 2009, SIGMOD Conference.

[8]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[9]  Tok Wang Ling,et al.  Effective XML Keyword Search with Relevance Oriented Ranking , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[10]  Tok Wang Ling,et al.  MCN: A New Semantics Towards Effective XML Keyword Search , 2009, DASFAA.

[11]  Alfred C. Weaver,et al.  Ieee Transactions on Knowledge and Data Engineering 1 an Empirical Performance Evaluation of Relational Keyword Search Techniques , 2022 .

[12]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[13]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[14]  Alfred V. Aho,et al.  On Finding Lowest Common Ancestors in Trees , 1976, SIAM J. Comput..

[15]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .