The algorithm of the join data stream with diskresident relation

Current data integration approaches are moving towards real-time updates. One important element in real-time data integration is the join of a continuous incoming data stream with a disk-resident relation. Because data stream is infinite, it is impossible to adopt blocking join algorithms such as sort-merge and hash join. The novel algorithm MESHJOIN has been proposed for joining a continuous stream with a disk-resident relation. The crux of MESHJOIN algorithm is that the whole memory block of disk-based relation is replaced at each iteration. We propose that the memory block is divided into a number of logical partitions, and then only one logical partition of memory block is replaced at each iteration. The experimental results show that the service rate of the join is increased because I/O cost for one loop iteration is decreased.

[1]  Panos Vassiliadis,et al.  Meshing Streaming Updates with Persistent Data in an Active Data Warehouse , 2008, IEEE Transactions on Knowledge and Data Engineering.

[2]  Michael J. Franklin,et al.  XJoin: Getting Fast Answers From Slow and Bursty Networks , 1999 .

[3]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[4]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, Distributed and Parallel Databases.

[5]  Lukasz Golab,et al.  Update-pattern-aware modeling and processing of continuous queries , 2005, SIGMOD '05.

[6]  Gerald Weber,et al.  HYBRIDJOIN for Near-Real-Time Data Warehousing , 2011, Int. J. Data Warehous. Min..

[7]  Gerald Weber,et al.  X-HYBRIDJOIN for Near-Real-Time Data Warehousing , 2011, BNCOD.

[8]  Ajit Singh,et al.  A partition-based approach to support streaming updates over persistent data in an active datawarehouse , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[9]  Gerald Weber,et al.  Comparing Global Optimization and Default Settings of Stream-Based Joins - (Experimental Paper) , 2009, BIRTE.

[10]  Gerald Weber,et al.  R-MESHJOIN for near-real-time data warehousing , 2010, DOLAP '10.

[11]  Zhang Dong-zhan MESHJOIN*:An Algorithm Supporting Streaming Updates in a Real-time Data Warehouse , 2010 .

[12]  Panos Vassiliadis,et al.  Supporting Streaming Updates in an Active Data Warehouse , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Wen-Chi Hou,et al.  Window query processing for joining data streams with relations , 2007, CASCON.

[14]  Bernhard Seeger,et al.  Progressive Merge Join: A Generic and Non-blocking Sort-based Join Algorithm , 2002, VLDB.

[15]  Vasilis Vassalos,et al.  Semi-Streamed Index Join for near-real time execution of ETL transformations , 2011, 2011 IEEE 27th International Conference on Data Engineering.