A burst resolution technique for data streams management in the real-time data warehouse

Data stream sources are currently emerged with the evolution of traditional data warehouse towards real-time data warehouse. Different solutions have been proposed to extract, transform and load the data streams but investigation is still needed to handle the bursts of incoming data streams. In this paper, we have proposed a flow regulation technique which regulates the fast and time varying bursts of data streams. For this purpose, we have adapted and used the token bucket that is simple and flexible mechanism having little overhead. The objective of this research is to minimize the probability of dropping data streams, synchronize processing power and balancing the load of arriving data streams. An algorithm for the flow regulation technique has been proposed to efficiently regulate the data streams. We have evaluated our technique on synthetic dataset and found that flow regulation technique works well in presence of bursty data streams.

[1]  Wolfgang Lehner,et al.  Partition-based workload scheduling in living data warehouse environments , 2007, DOLAP '07.

[2]  Jeffrey F. Naughton,et al.  Transaction Reordering and Grouping for Continuous Data Loading , 2006, BIRTE.

[3]  Evaggelia Pitoura,et al.  ETL queues for active data warehousing , 2005, IQIS '05.

[4]  W. Gansterer,et al.  TOKEN BUCKETS FOR OUTGOING SPAM PREVENTION , 2005 .

[5]  Panos Vassiliadis,et al.  Meshing Streaming Updates with Persistent Data in an Active Data Warehouse , 2008, IEEE Transactions on Knowledge and Data Engineering.

[6]  A Min Tjoa,et al.  Managing Time Consistency for Active Data Warehouse Environments , 2001, DaWaK.

[7]  A Min Tjoa,et al.  Zero-latency data warehousing (ZLDWH): the state-of-the-art and experimental implementation approaches , 2006, 2006 International Conference onResearch, Innovation and Vision for the Future.

[8]  Fiaz Majeed,et al.  Efficient data streams processing in the real time data warehouse , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[9]  Sudipto Guha,et al.  Approximating a data stream for querying and estimation: algorithms and performance evaluation , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  Muhammad Shoaib,et al.  DATA STREAMS MANAGEMENT IN THE REAL-TIME DATA WAREHOUSE: FUNCTIONING OF THE DATA STREAMS PROCESSOR , 2011 .

[11]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[12]  A Min Tjoa,et al.  Zero-Latency Data Warehousing for Heterogeneous Data Sources and Continuous Data Streams , 2003, iiWAS.

[13]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[14]  Jan H. M. Korst,et al.  Optimal Bus and Buffer Allocation for a Set of Leaky-Bucket-Controlled Streams , 2004, ICT.

[15]  J. Turner,et al.  New directions in communications (or which way to the information age?) , 1986, IEEE Communications Magazine.

[16]  Viswanath Poosala,et al.  Congressional Samples for Approximate Answering of Group-By Queries , 2000, SIGMOD Conference.

[17]  Hassan B. Kazemian An intelligent video streaming technique in zigbee wireless , 2009, 2009 IEEE International Conference on Fuzzy Systems.

[18]  Torben Bach Pedersen,et al.  RiTE: Providing On-Demand Data for Right-Time Data Warehousing , 2008, 2008 IEEE 24th International Conference on Data Engineering.