Towards Automated Analysis of Connections Network in Distributed Stream Processing System

Not so long ago data warehouses were used to process data sets loaded periodically during ETL process (Extraction, Transformation and Loading). We could distinguish two kinds of ETL processes: full and incremental. Now we often have to process real-time data and analyse them almost on-the-fly, so the analyses are always up to date. There are many possible applications for real-time data warehouses. In most cases two features are important: delivering data to the warehouse as quick as possible, and not losing any tuple in case of failures. In this paper we describe an architecture for gathering and processing data from geographically distributed data sources and we define a method for analysing properties of the connections structure, finding the weakest points in case of single and multiple node failures. At the end of the paper our future plans are described briefly.

[1]  Marcin Gorawski,et al.  Towards Reliability and Fault-Tolerance of Distributed Stream Processing System , 2007, 2nd International Conference on Dependability of Computer Systems (DepCoS-RELCOMEX '07).

[2]  Marcin Gorawski,et al.  Checkpoint-based resumption in data warehouses , 2006, SET.

[3]  Samuel Madden,et al.  Fjording the stream: an architecture for queries over streaming sensor data , 2002, Proceedings 18th International Conference on Data Engineering.

[4]  Hector Garcia-Molina,et al.  Efficient resumption of interrupted warehouse loads , 2000, SIGMOD 2000.

[5]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[6]  Marcin Gorawski,et al.  High Efficiency of Hybrid Resumption in Distributed Data Warehouses , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[7]  Marcin Gorawski,et al.  Distributed Spatial Data Warehouse Indexed with Virtual Memory Aggregation Tree , 2004, STDBM.

[8]  Marcin Gorawski,et al.  Distributed Stream Processing Analysis in High Availability Context , 2007, The Second International Conference on Availability, Reliability and Security (ARES'07).

[9]  Marcin Gorawski,et al.  Fault-Tolerant Distributed Stream Processing System , 2006, 17th International Workshop on Database and Expert Systems Applications (DEXA'06).

[10]  Michael Stonebraker,et al.  Fault-tolerance in the Borealis distributed stream processing system , 2005, SIGMOD '05.