LAF: a new XML encoding and indexing strategy for keyword‐based XML search

As a large number of corpuses are represented, stored and published in XML format, how to find useful information from XML databases has become an increasingly important issue. Keyword search enables web users to easily access XML data without the need to learn a structured query language or to study complex data schemas. Most existing indexing strategies for XML keyword search are based upon Dewey encoding. In this paper, we proposed a new encoding method called Level Order and Father (LAF) for XML documents. With LAF encoding, we devised a new index structure, called two‐layer LAF inverted index, which can greatly decrease the space complexity compared with Dewey encoding‐based inverted index. Furthermore, with two‐layer LAF inverted index, we proposed a new keyword query algorithm called Algorithm based on Binary Search (ABS) that can quickly find all Smallest Lowest Common Ancestor. We experimentally evaluate two‐layer LAF inverted index and ABS algorithm on four real XML data sets selected from Wikipedia. The experimental results prove the advantages of our index method and querying algorithm. The space consumed by two‐layer LAF index is less than half of that consumed by Dewey inverted index. Moreover, ABS is about one to two orders of magnitude faster than the classic Stack algorithm. Concurrency and Computation: Practice and Experience, 2012.© 2012 Wiley Periodicals, Inc.

[1]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[2]  Jianxin Li,et al.  Fast ELCA computation for keyword queries on XML data , 2010, EDBT '10.

[3]  Yi Chen,et al.  XSeek: A Semantic XML Search Engine Using Keywords , 2007, VLDB.

[4]  Rémi Gilleron,et al.  Retrieving meaningful relaxed tightest fragments for XML keyword search , 2009, EDBT '09.

[5]  Shiwei Tang,et al.  Adaptive Top-k Algorithm in SLCA-Based XML Keyword Search , 2010, 2010 12th International Asia-Pacific Web Conference.

[6]  Christian Mathis,et al.  Node labeling schemes for dynamic XML documents reconsidered , 2007, Data Knowl. Eng..

[7]  Yannis Papakonstantinou,et al.  Supporting top-K keyword search in XML databases , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[8]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[9]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[10]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[11]  Ziyang Liu,et al.  Query biased snippet generation in XML search , 2008, SIGMOD Conference.

[12]  Chee Yong Chan,et al.  Multiway SLCA-based keyword search in XML data , 2007, WWW '07.

[13]  Yannis Papakonstantinou,et al.  Efficient LCA based keyword search in xml data , 2007, CIKM '07.

[14]  Yi Chen,et al.  Reasoning and identifying relevant matches for XML keyword search , 2008, Proc. VLDB Endow..

[15]  Hang Yu,et al.  A new indexing strategy for XML keyword search , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[16]  Jianyong Wang,et al.  Effective keyword search for valuable lcas over xml documents , 2007, CIKM '07.

[17]  Nils Grimsmo,et al.  Faster Path Indexes for Search in XML Data , 2008, ADC.

[18]  Guoliang Li,et al.  SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents , 2009, Inf. Sci..

[19]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[20]  Tok Wang Ling,et al.  Effective XML Keyword Search with Relevance Oriented Ranking , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[21]  James Allan,et al.  A survey in indexing and searching XML documents , 2002, J. Assoc. Inf. Sci. Technol..

[22]  Jianyong Wang,et al.  Finding and ranking compact connected trees for effective keyword proximity search in XML documents , 2010, Inf. Syst..

[23]  Tok Wang Ling,et al.  Towards an Effective XML Keyword Search , 2010, IEEE Transactions on Knowledge and Data Engineering.

[24]  Yi Chen,et al.  Improving XML search by generating and utilizing informative result snippets , 2010, TODS.

[25]  Ziyang Liu,et al.  Return specification inference and result clustering for keyword search on XML , 2010, TODS.

[26]  Tok Wang Ling,et al.  DDE: from dewey to a fully dynamic XML labeling scheme , 2009, SIGMOD Conference.