EFilter: An Efficient Filter for Supporting Twig Query Patterns in XML Streams

With the rapid development of the Internet, XML (eXtensible Markup Language) has become the standard data format for representation and exchange on the Internet. In many applications, XML files are transferred in a form of continuous streams. For example, in publishing- subscription systems, data is recorded in an XML format and the conditions of the user's subscription are expressed as queries. Thus, how to filter a continuous stream of XML documents against a large number of queries is an important issue. In this paper, we proposed an efficient filter called EFilter to support twig query patterns in XML streams. Users' queries are recorded in a compressed tree structure called Query Guide and a hash table called QLinkedList. Through a bottom-up search of Query Guide, the XML documents are processed only once as they arrive. Experimental results show that EFilter is more efficient than FiST (Kwon et al., 2005) and SFilter (Nizar, Babu and Kumar, 2009) in terms of filtering speed.