Optimization of operator partitions in stream data warehouse

Memory and time optimization is a key task of Stream Data Warehouses (SDWs). StrETL processes in those systems are similar to queries in Data Stream Management Systems (DSMSs). This fact allows us to migrate some methods from DSMS to SDW. We have observed that schedulers and algorithms introduced to create operator partitions are analyzed separately either in StrETL processes or in stream queries. The fact is, those two mechanisms affect each other and it is justified to study potential benefits of combining them together. In the paper we introduce a solution which cooperates with a scheduler in order to create more efficient operator partitions. Another noteworthy issue is that this algorithm is able to optimize a wider range of operator topologies. Finally, experimental evaluation show that our solution allows achieving a smaller memory consumption or a shorter response time in comparison with the competing strategies.

[1]  Rajeev Motwani,et al.  Chain: operator scheduling for memory minimization in data stream systems , 2003, SIGMOD '03.

[2]  Kirk Pruhs,et al.  Efficient scheduling of heterogeneous continuous queries , 2006, VLDB.

[3]  B. Seeger,et al.  PIPES : A Multi-Threaded Publish-Subscribe Architecture for Continuous Queries over Streaming Data Sources , 2003 .

[4]  Peter A. Tucker,et al.  NEXMark – A Benchmark for Queries over Data Streams DRAFT , 2002 .

[5]  Carlo Zaniolo,et al.  Minimizing latency and memory in DSMS: a unified approach to quasi-optimal scheduling , 2008, SSPS '08.

[6]  Carlo Zaniolo,et al.  Optimizing Timestamp Management in Data Stream Management Systems , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[7]  Stan Zdonik,et al.  Load Shedding Techniques for Data Stream Management Systems , 2007 .

[8]  Marcin Gorawski Advanced data warehouses , 2009 .

[9]  Michael Stonebraker,et al.  Operator Scheduling in a Data Stream Manager , 2003, VLDB.

[10]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[11]  Marcin Gorawski,et al.  Towards Stream Data Parallel Processing in Spatial Aggregating Index , 2007, PPAM.

[12]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[13]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[14]  Marcin Gorawski,et al.  The Design of Stream Database Engine in Concurrent Environment , 2009, OTM Conferences.

[15]  Panos Vassiliadis,et al.  Meshing Streaming Updates with Persistent Data in an Active Data Warehouse , 2008, IEEE Transactions on Knowledge and Data Engineering.

[16]  Ajit Singh,et al.  A partition-based approach to support streaming updates over persistent data in an active datawarehouse , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[17]  Marcin Gorawski,et al.  StreamAPAS: Query Language and Data Model , 2009, 2009 International Conference on Complex, Intelligent and Software Intensive Systems.

[18]  Mohammad Taghi Hajiaghayi,et al.  Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses , 2009, SPAA '09.