Supporting Generic Cost Models for Wide-Area Stream Processing

Existing stream processing systems are optimized for a specific metric, which may limit their applicability to diverse applications and environments. This paper presents XFlow, a generic data stream collection, processing, and dissemination system that addresses this limitation efficiently. XFlow can express and optimize a variety of optimization metrics and constraints by distributing stream processing queries across a wide-area network. It uses metric-independent decentralized algorithms that work on localized, aggregated statistics, while avoiding local optima. To facilitate light-weight dynamic changes on the query deployment, XFlow relies on a loosely-coupled, flexible architecture consisting of multiple publish-subscribe overlay trees that can gracefully scale and adapt to changes to network and workload conditions. Based on the desired performance goals, the system progressively refines the query deployment, the structure of the overlay trees, as well as the statistics collection process. We provide an overview of XFlow's architecture and discuss its decentralized optimization model. We demonstrate its flexibility and the effectiveness using real-world streams and experimental results obtained from XFlow's deployment on PlanetLab. The experiments reveal that XFlow can effectively optimize various performance metrics in the presence of varying network and workload conditions.

[1]  Yanif Ahmad,et al.  Networked Query Processing for Distributed Stream-Based Applications , 2004, VLDB.

[2]  Amin Vahdat,et al.  MACEDON: Methodology for Automatically Creating, Evaluating, and Designing Overlay Networks , 2004, NSDI.

[3]  Olga Papaemmanouil,et al.  XFlow : Internet-Scale Distributed Stream Processing , 2007 .

[4]  Mark Pruett,et al.  Yahoo! pipes , 2007 .

[5]  Margo I. Seltzer,et al.  Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  Miguel Castro,et al.  SplitStream: high-bandwidth multicast in cooperative environments , 2003, SOSP '03.

[7]  Emin Gün Sirer,et al.  Corona: A High Performance Publish-Subscribe System for the World Wide Web , 2006, NSDI.

[8]  Alfons Kemper,et al.  StreamGlobe: Processing and Sharing Data Streams in Grid-Based P2P Infrastructures , 2005, VLDB.

[9]  William E. Weihl,et al.  Lottery scheduling: flexible proportional-share resource management , 1994, OSDI '94.

[10]  Pascal Felber,et al.  A scalable protocol for content-based routing in overlay networks , 2003, Second IEEE International Symposium on Network Computing and Applications, 2003. NCA 2003..

[11]  Elke A. Rundensteiner,et al.  Dynamic plan migration for continuous queries over data streams , 2004, SIGMOD '04.

[12]  Jaideep Srivastava,et al.  Distributed Intrusion Detection , 2012 .

[13]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[14]  Ion Stoica,et al.  Implementing declarative overlays , 2005, SOSP '05.

[15]  Navendu Jain,et al.  Design, implementation, and evaluation of the linear road bnchmark on the stream processing core , 2006, SIGMOD Conference.

[16]  Alexander L. Wolf,et al.  Forwarding in a content-based network , 2003, SIGCOMM '03.

[17]  Srinivasan Seshan,et al.  IrisNet: an internet-scale architecture for multimedia sensors , 2005, MULTIMEDIA '05.

[18]  Ben Y. Zhao,et al.  Bayeux: an architecture for scalable and fault-tolerant wide-area data dissemination , 2001, NOSSDAV '01.

[19]  Yanlei Diao,et al.  Towards an Internet-Scale XML Dissemination Service , 2004, VLDB.

[20]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[21]  Philippe Bonnet,et al.  Adaptive and Decentralized Operator Placement for In-Network Query Processing , 2003, Telecommun. Syst..

[22]  Feng Yu,et al.  Leveraging Distributed Publish/Subscribe Systems for Scalable Stream Query Processing , 2006, BIRTE.

[23]  Miguel Castro,et al.  Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..

[24]  Michael Stonebraker,et al.  Contract-Based Load Management in Federated Distributed Systems , 2004, NSDI.

[25]  Amin Vahdat,et al.  Bullet: high bandwidth data dissemination using an overlay mesh , 2003, SOSP '03.

[26]  Olga Papaemmanouil,et al.  SemCast: semantic multicast for content-based data dissemination , 2005, 21st International Conference on Data Engineering (ICDE'05).

[27]  Karsten Schwan,et al.  Resource-Aware Distributed Stream Management Using Dynamic Overlays , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[28]  Donald F. Towsley,et al.  Channelization problem in large scale data dissemination , 2001, Proceedings Ninth International Conference on Network Protocols. ICNP 2001.

[29]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[30]  Olga Papaemmanouil,et al.  Extensible optimization in overlay dissemination trees , 2006, SIGMOD Conference.