Data stream processing in dynamic and decentralized peer-to-peer networks

Data stream management systems (DSMS) process data streams, potentially infinite amounts of data sent by active data sources. Distributed DSMS use networks of interconnected machines to enhance the processing power. Typically, clusters of equal, non-autonomous machines are used. However, in some applications, a cluster of computers is not available, not feasible, their acquisition costs are too high or they are too complex to deploy. An alternative would be to use a collection of notebooks, personal computers or smartphones, resulting in a network which only contains autonomous and heterogeneous machines. This results in a dynamic and decentralized network which has to be considered in distributed data stream processing. In this paper, I present my PhD project for developing and deploying a distributed DSMS that can be executed in a Peer-to-Peer (P2P) network of autonomous and heterogeneous peers. My approach addresses three main challenges: data source management, continuous query distribution and distributed query management. A prototypical implementation is already in place and the evaluation is currently planned.

[1]  Margo I. Seltzer,et al.  Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Li Gong,et al.  JXTA in a nutshell - a desktop quick reference , 2002 .

[3]  B. Seeger,et al.  PIPES : A Multi-Threaded Publish-Subscribe Architecture for Continuous Queries over Streaming Data Sources , 2003 .

[4]  Jennifer Widom,et al.  STREAM: The Stanford Data Stream Management System , 2016, Data Stream Management.

[5]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[6]  Timo Michelsen,et al.  Odysseus: a highly customizable framework for creating efficient event stream management systems , 2012, DEBS.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Carlo Zaniolo,et al.  Designing an inductive data stream management system: the stream mill experience , 2008, SSPS '08.

[9]  Volker Markl,et al.  Applying Stratosphere for Big Data Analytics , 2013, BTW.

[10]  Calton Pu,et al.  Continual Queries for Internet Scale Event-Driven Information Delivery , 1999, IEEE Trans. Knowl. Data Eng..

[11]  Beng Chin Ooi,et al.  Peer-to-Peer Computing - Principles and Applications , 2009 .

[12]  Michael Stonebraker,et al.  The Aurora and Medusa Projects , 2003, IEEE Data Eng. Bull..

[13]  Alfons Kemper,et al.  StreamGlobe: Processing and Sharing Data Streams in Grid-Based P2P Infrastructures , 2005, VLDB.

[14]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[15]  Robert Tappan Morris,et al.  Practical, distributed network coordinates , 2004, Comput. Commun. Rev..

[16]  Karl Aberer,et al.  The Global Sensor Networks middleware for efficient and flexible deployment and interconnection of sensor networks , 2006 .

[17]  Ying Xing,et al.  Distributed operation in the Borealis stream processing engine , 2005, SIGMOD '05.

[18]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[19]  Yoonho Park,et al.  SPC: a distributed, scalable platform for data mining , 2006, DMSSP '06.

[20]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[21]  Levent Gürgen,et al.  SStreaMWare: a service oriented middleware for heterogeneous sensor data management , 2008, ICPS '08.

[22]  Jürgen Krämer Continuous queries over data stream - semantics and implementation , 2009, BTW.

[23]  Hamid Pirahesh,et al.  Alert: An Architecture for Transforming a Passive DBMS into an Active DBMS , 1991, VLDB.

[24]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[25]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[26]  Patrick Valduriez,et al.  StreamCloud: A Large Scale Data Streaming System , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.