Distributed processing of continuous join queries using DHT networks

This paper addresses the problem of computing approximate answers to continuous join queries. We present a new method, called DHTJoin, which combines hash-based placement of tuples in a Distributed Hash Table (DHT) and dissemination of queries exploiting the trees formed by the underlying DHT links. DHTJoin distributes the query workload across multiple DHT nodes and provides a mechanism that avoids indexing tuples that cannot contribute to join results. We provide a performance evaluation which shows that DHTJoin can achieve significant performance gains in terms of network traffic.

[1]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[2]  Scott Shenker,et al.  Querying the Internet with PIER , 2003, VLDB.

[3]  Yin Yang,et al.  Just-In-Time Processing of Continuous Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[5]  Stanley B. Zdonik,et al.  Window-aware load shedding for aggregation queries over data streams , 2006, VLDB.

[6]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[7]  Helen J. Wang,et al.  An evaluation of scalable application-level multicast built using peer-to-peer overlays , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[8]  Philippe Bonnet,et al.  Towards Sensor Database Systems , 2001, Mobile Data Management.

[9]  Ling Liu,et al.  PeerCQ: a decentralized and self-configuring peer-to-peer information monitoring system , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[10]  Jennifer Widom,et al.  Memory-Limited Execution of Windowed Stream Joins , 2004, VLDB.

[11]  Manfred Hauswirth,et al.  Estimating the number of answers with guarantees for structured queries in p2p databases , 2008, CIKM '08.

[12]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[13]  Stanley B. Zdonik,et al.  Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing , 2007, VLDB.

[14]  Lukasz Golab,et al.  Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams , 2003, VLDB.

[15]  Elke A. Rundensteiner,et al.  A Dynamically Adaptive Distributed System for Processing Complex Continuous Queries , 2005, VLDB.

[16]  Mark Sullivan,et al.  Tribeca: A Stream Database Manager for Network Traffic Analysis , 1996, VLDB.

[17]  Patrick Valduriez,et al.  Efficient Processing of Continuous Join Queries Using Distributed Hash Tables , 2008, Euro-Par.

[18]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[19]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[20]  Jennifer Widom,et al.  A denotational semantics for continuous queries over streams and relations , 2004, SGMD.

[21]  Felix Naumann,et al.  Completeness of integrated information sources , 2004, Inf. Syst..

[22]  Seif Haridi,et al.  Efficient Broadcast in Structured P2P Networks , 2003, IPTPS.

[23]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[24]  Patrick Valduriez,et al.  Best Position Algorithms for Top-k Queries , 2007, VLDB.

[25]  Divesh Srivastava,et al.  Optimizing away joins on data streams , 2008, SSPS '08.

[26]  Jeffrey F. Naughton,et al.  Evaluating window joins over unbounded streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[27]  Rajeev Motwani,et al.  The price of validity in dynamic networks , 2004, SIGMOD '04.

[28]  Manolis Koubarakis,et al.  Continuous multi-way joins over distributed hash tables , 2008, EDBT '08.