MAXLCA: A New Query Semantic Model for XML Keyword Search

Keyword search enables web users to easily access XML data without understanding the complex data schemas. However, the ambiguity of keyword search makes it arduous to select qualified data nodes matching keywords. To address this challenge in XML datasets whose documents have a relatively low average size, we present a new keyword query semantic model: MAXimal Lowest Common Ancestor (MAXLCA). MAXLCA can effectively avoid false negative problem observed in ELCA, SLCA and XSeek. Furthermore, we construct an algorithm GMAX for MAXLCA-based queries that is proved efficient in evaluations. Experiments on INEX show that the search engine using MAXLCA and GMAX outperforms in all three comparative criteria: effective, efficient and processing scalability.

[1]  Chee Yong Chan,et al.  Multiway SLCA-based keyword search in XML data , 2007, WWW '07.

[2]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[3]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[4]  Tok Wang Ling,et al.  Effective XML Keyword Search with Relevance Oriented Ranking , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[5]  Divesh Srivastava,et al.  Keyword proximity search in XML trees , 2006, IEEE Transactions on Knowledge and Data Engineering.

[6]  VishkinUzi,et al.  On finding lowest common ancestors: simplification and parallelization , 1988 .

[7]  Jianxin Li,et al.  Fast ELCA computation for keyword queries on XML data , 2010, EDBT '10.

[8]  Shiwei Tang,et al.  Adaptive Top-k Algorithm in SLCA-Based XML Keyword Search , 2010, 2010 12th International Asia-Pacific Web Conference.

[9]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[10]  Yi Chen,et al.  eXtract: a snippet generation system for XML search , 2008, Proc. VLDB Endow..

[11]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[12]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[13]  Yi Chen,et al.  Reasoning and identifying relevant matches for XML keyword search , 2008, Proc. VLDB Endow..

[14]  Peter Mika,et al.  Ad-hoc object retrieval in the web of data , 2010, WWW '10.

[15]  Jianyong Wang,et al.  Effective keyword search for valuable lcas over xml documents , 2007, CIKM '07.

[16]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[17]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[18]  Gerhard Weikum,et al.  TopX: efficient and versatile top-k query processing for semistructured data , 2007, The VLDB Journal.

[19]  Yi Chen,et al.  Answering Keyword Queries on XML Using Materialized Views , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[20]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[21]  Andrew Trotman,et al.  NEXI, Now and Next , 2004, INEX.

[22]  Uzi Vishkin,et al.  On Finding Lowest Common Ancestors: Simplification and Parallelization , 1988, AWOC.

[23]  Yannis Papakonstantinou,et al.  Supporting top-K keyword search in XML databases , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[24]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.