An Evaluation Study of Search Algorithms for XML Streams

Keyword-based searching services over XML streams are essential for widely  used streaming applications, such as dissemination services, sensor networks and stock market quotes. However, XML stream keyword search algorithms are  usually schema dependent and do not allow pure keyword queries. Furthermore, ranking methods are still relatively unexploited in such algorithms. This paper presents an accuracy and performance study of two keyword-based search algorithms for XML streams.Our study provides a comparison of these two algorithms by using an XPath benchmark as source of data and queries. Moreover, we also consider  a large collection of XML documents and a large set of random queries, both based on DBLP dataset. Finally, we propose a strategy that combines both algorithms and ranks the keyword-based search results.

[1]  Erik Wilde Position Paper for the W3C Workshop on Binary Interchange of XML Information Item Sets , 2003 .

[2]  Chee Yong Chan,et al.  Multiway SLCA-based keyword search in XML data , 2007, WWW '07.

[3]  Prakash V. Ramanan,et al.  Worst-case optimal algorithm for XPath evaluation over XML streams , 2009, J. Comput. Syst. Sci..

[4]  Erik Wilde,et al.  XML fever , 2008, CACM.

[5]  Susan B. Davidson,et al.  An Efficient XPath Query Processor for XML Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  Quanzhong Li,et al.  On the Effectiveness of Flexible Querying Heuristics for XML Data , 2007, XSym.

[7]  Weidong Yang,et al.  Schema-Aware Keyword Search over XML Streams , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[8]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[9]  Zografoula Vagena,et al.  Semantic Search over XML Document Streams , 2008 .

[10]  Massimo Franceschet XPathMark: An XPath Benchmark for the XMark Generated Data , 2005, XSym.

[11]  Ioana Manolescu,et al.  A Benchmark for XML Data Management , 2002 .

[12]  Sudarshan S. Chawathe,et al.  XSQ: A streaming XPath engine , 2005, TODS.

[13]  Makoto Onizuka,et al.  Processing XPath queries with forward and downward axes over XML streams , 2010, EDBT '10.

[14]  Tok Wang Ling,et al.  Using semantics in XML query processing , 2008, ICUIMC '08.

[15]  Jianxin Li,et al.  Suggestion of promising result types for XML keyword search , 2010, EDBT '10.

[16]  Jianxin Li,et al.  Fast ELCA computation for keyword queries on XML data , 2010, EDBT '10.

[17]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[18]  Marianne Winslett,et al.  Effective, design-independent XML keyword search , 2009, CIKM.

[19]  Tok Wang Ling,et al.  Effective XML Keyword Search with Relevance Oriented Ranking , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[20]  Mirella M. Moro,et al.  An X-ray on web-available XML schemas , 2009, SGMD.

[21]  Yannis Papakonstantinou,et al.  Efficient LCA based keyword search in xml data , 2007, CIKM '07.

[22]  Ronaldo dos Santos Mello,et al.  XML: some papers in a haystack , 2009, SGMD.

[23]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[24]  Yi Chen,et al.  Reasoning and identifying relevant matches for XML keyword search , 2008, Proc. VLDB Endow..

[25]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[26]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.