XFlow : Internet-Scale Distributed Stream Processing

Existing stream processing systems are designed for clustered deployments, and cannot adequately meet the scalability and adaptivity requirements of Internet-scal e monitoring applications. Furthermore, these systems commonly optimize for a specific QoS metric, which may limit their applicability to diverse applications and environments. This paper presents XFlow, a generic distributed data collection, processing, and dissemination system that addresses these limitations. XFlow integrates a pub-sub mode l with data flows for stream processing. The underlying pubsub model decouples sources and clients, as well as the processing operators, leading to a loosely-coupled architect ure that can gracefully scale, adapt to churn in system membership and workload, and facilitate sophisticated optimizations. We first provide an overview of XFlow’s architecture. We then describe XFlow’s optimization model that changes the placement and implementation of operators to meet application-specific performance goals and constraints. F inally, we demonstrate the flexibility and the effectiveness using real-world streams and experimental results obtaine d from our PlanetLab deployment.

[1]  Pascal Felber,et al.  A scalable protocol for content-based routing in overlay networks , 2003, Second IEEE International Symposium on Network Computing and Applications, 2003. NCA 2003..

[2]  Olga Papaemmanouil,et al.  Extensible optimization in overlay dissemination trees , 2006, SIGMOD Conference.

[3]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[4]  Yanlei Diao,et al.  Query Processing for High-Volume XML Message Brokering , 2003, VLDB.

[5]  Emin Gün Sirer,et al.  Client behavior and feed characteristics of RSS, a publish-subscribe system for web micronews , 2005, IMC '05.

[6]  Amin Vahdat,et al.  MACEDON: Methodology for Automatically Creating, Evaluating, and Designing Overlay Networks , 2004, NSDI.

[7]  Donald F. Towsley,et al.  Channelization problem in large scale data dissemination , 2001, Proceedings Ninth International Conference on Network Protocols. ICNP 2001.

[8]  Alfons Kemper,et al.  StreamGlobe: Processing and Sharing Data Streams in Grid-Based P2P Infrastructures , 2005, VLDB.

[9]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[10]  Elke A. Rundensteiner,et al.  Dynamic plan migration for continuous queries over data streams , 2004, SIGMOD '04.

[11]  Margo I. Seltzer,et al.  Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Alexander L. Wolf,et al.  Forwarding in a content-based network , 2003, SIGCOMM '03.

[13]  Ion Stoica,et al.  Implementing declarative overlays , 2005, SOSP '05.

[14]  Amin Vahdat,et al.  Bullet: high bandwidth data dissemination using an overlay mesh , 2003, SOSP '03.

[15]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[16]  Michael Stonebraker,et al.  Contract-Based Load Management in Federated Distributed Systems , 2004, NSDI.

[17]  Srinivasan Seshan,et al.  IrisNet: an internet-scale architecture for multimedia sensors , 2005, MULTIMEDIA '05.

[18]  G. Weikum Querying the Internet with PIER , 2005 .

[19]  Jennifer Widom,et al.  Operator placement for in-network stream query processing , 2005, PODS.

[20]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, The VLDB Journal.

[21]  Patrick M. Widener,et al.  IFLOW : Resource-Aware Overlays for Composing and Managing Distributed Information Flows , 2006 .

[22]  Olga Papaemmanouil,et al.  SemCast: semantic multicast for content-based data dissemination , 2005, 21st International Conference on Data Engineering (ICDE'05).

[23]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[24]  Yanlei Diao,et al.  Towards an Internet-Scale XML Dissemination Service , 2004, VLDB.