Rethinking the design of distributed stream processing systems

In this paper, we present a novel architecture to support large scale stream processing services in a widely distributed environment. The proposed system, COSMOS, distinguishes itself by its loose coupling and communication efficiency. To exploit the sharing of data transfer incurred by different queries and to break the tight coupling of the distributed nodes, a new communication paradigm, content-based network, is employed. We discuss the design and the challenges of this system.

[1]  Michael Stonebraker,et al.  High-availability algorithms for distributed stream processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[2]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[3]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[4]  Jennifer Widom,et al.  Resource Sharing in Continuous Sliding-Window Aggregates , 2004, VLDB.

[5]  Karl Aberer,et al.  Infrastructure for Data Processing in Large-Scale Interconnected Sensor Networks , 2007, 2007 International Conference on Mobile Data Management.

[6]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[7]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[8]  Yanif Ahmad,et al.  Networked Query Processing for Distributed Stream-Based Applications , 2004, VLDB.

[9]  Eric A. Brewer,et al.  Highly available, fault-tolerant, parallel dataflows , 2004, SIGMOD '04.

[10]  Beng Chin Ooi,et al.  Adaptive Reorganization of Coherency-Preserving Dissemination Tree for Streaming Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Alexander L. Wolf,et al.  Content-Based Networking: A New Communication Infrastructure , 2001, Infrastructure for Mobile and Wireless Systems.

[12]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.

[13]  Michael Stonebraker,et al.  Fault-tolerance in the Borealis distributed stream processing system , 2005, SIGMOD '05.

[14]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[15]  Beng Chin Ooi,et al.  Efficient Dynamic Operator Placement in a Locally Distributed Continuous Query System , 2006, OTM Conferences.

[16]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[17]  Margo I. Seltzer,et al.  Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18]  Beng Chin Ooi,et al.  Disseminating streaming data in a dynamic environment: an adaptive and cost-based approach , 2008, The VLDB Journal.