Efficient Filtering of RSS Documents on Computer Cluster

RSS ltering is very important today with the increasing amount of information on the Web. There are many tools to aggregrate and manipulate content from around the web based on the RSS format. Today clusters are the infrastructure of choice for many large Internet service provider. In this paper we develop algorithms to enable ecien t ltering of RSS documents, which is in a graph structured data format, on a computing cluster. We propose indexing and ltering algorithms and suggest several optimizations. The results indicate that the system throughput increases to 400% on a cluster infrastructure over a non-clustered, centralized implementation. In general, we observe that the ltering performance of our algorithms scales linearly in the number of compute nodes in the cluster.

[1]  Jesse James Garrett Ajax: A New Approach to Web Applications , 2007 .

[2]  Olga Papaemmanouil,et al.  SemCast: semantic multicast for content-based data dissemination , 2005, 21st International Conference on Data Engineering (ICDE'05).

[3]  Hans-Arno Jacobsen,et al.  G-ToPSS: fast filtering of graph-based metadata , 2005, WWW '05.

[4]  Robert M. MacGregor,et al.  A subscribable peer-to-peer RDF repository for distributed metadata management , 2004, J. Web Semant..

[5]  Marcos K. Aguilera,et al.  Matching events in a content-based subscription system , 1999, PODC '99.

[6]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, The VLDB Journal.

[7]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[8]  Alfonso Fuggetta,et al.  The JEDI Event-Based Infrastructure and Its Application to the Development of the OPSS WFMS , 2001, IEEE Trans. Software Eng..

[9]  Yanlei Diao,et al.  YFilter: efficient and scalable filtering of XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  Wolfgang Nejdl,et al.  Publish/Subscribe for RDF-based P2P Networks , 2004, ESWS.

[11]  Dennis Shasha,et al.  Filtering algorithms and implementation for very fast publish/subscribe systems , 2001, SIGMOD '01.

[12]  Volker Haarslev,et al.  Incremental Query Answering for Implementing Document Retrieval Services , 2003, Description Logics.

[13]  Guruduth Banavar,et al.  An efficient multicast protocol for content-based publish-subscribe systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[14]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[15]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[16]  Liang Yue Subscription Partitioning and Routing in Content-based Publish / Subscribe Systems , 2007 .