Distributed XML stream filtering system with high scalability

We propose a distributed XML stream filtering system that uses a large number of subscribers' profiles, written in XPath expressions, to filter XML streams and then publish the filtered data in real-time. To realize the proposed system, we define XPath expression features on XML data and utilize them to forecast the servers' loads. Our method is realized by combining methods to share the total transfer loads of each filtering server and to equalize the sum of overlap size between filtering servers. Experiments show that the rate at which the publishing time increases with the number of XPath expressions is three times smaller in the proposed system than in the round-robin method. Furthermore, the overhead of the proposed method is quite low.

[1]  Dan Suciu,et al.  Processing XML Streams with Deterministic Automata , 2003, ICDT.

[2]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, The VLDB Journal.

[3]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[4]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[5]  Makoto Onizuka Light-weight xPath processing of XML stream with deterministic automata , 2003, CIKM '03.

[6]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[7]  Dennis Shasha,et al.  WebFilter: A High-throughput XML-based Publish and Subscribe System , 2001, VLDB.

[8]  Dan Suciu,et al.  Containment and equivalence for an XPath fragment , 2002, PODS.

[9]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[10]  Dean Jackson Scalable vector graphics (SVG): the world wide web consortium's recommendation for high quality web graphics , 2002, SIGGRAPH '02.

[11]  Thomas Schwentick,et al.  XPath Containment in the Presence of Disjunction, DTDs, and Variables , 2003, ICDT.

[12]  Yanlei Diao,et al.  High-Performance XML Filtering: An Overview of YFilter , 2003, IEEE Data Eng. Bull..

[13]  Alex C. Snoeren,et al.  Mesh-based content routing using XML , 2001, SOSP.

[14]  Yanlei Diao,et al.  Query Processing for High-Volume XML Message Brokering , 2003, VLDB.

[15]  Kim Moorman Web review: W3C, the World Wide Web consortium , 1999, CROS.

[16]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[17]  Ravi Jain,et al.  Efficient dissemination of personalized information using content-based multicast , 2004, IEEE Transactions on Mobile Computing.

[18]  Yanlei Diao,et al.  Towards an Internet-Scale XML Dissemination Service , 2004, VLDB.

[19]  Pascal Felber,et al.  A scalable protocol for content-based routing in overlay networks , 2003, Second IEEE International Symposium on Network Computing and Applications, 2003. NCA 2003..

[20]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[21]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.