Distributed data streams indexing using content-based routing paradigm

In recent years, we have seen a dramatic increase in the use of data-centric distributed systems such as global grid infrastructures, sensor networks, network monitoring, and various publish-subscribe systems. The realization of this potential requires adequate support from middleware that could be used to deploy and support such systems. In this regard, we propose an integrated distributed indexing architecture that supports scalable handling of intense dynamic information flows. The architecture is geared towards providing timely responses to queries of different types while minimizing the use of network and computational resources. The underlying communication framework ensures scalability and load balancing of communication as well as adaptivity in presence of dynamic changes. We elaborate on database and content-based routing methodologies used in the integrated solution as well as non-trivial interaction between them, and thereby provide a valuable feedback to the designers of these techniques. We demonstrate the effectiveness of our architecture with performance results that we obtained using our prototype implementation on top of the Chord system simulator.

[1]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[2]  Ambuj K. Singh,et al.  An Adaptive and Scalable Middleware for Distributed Indexing of Data Streams , 2003, DBISP2P.

[3]  Jennifer Widom,et al.  Adaptive precision setting for cached approximate values , 2001, SIGMOD '01.

[4]  Srinivasan Seshan,et al.  Cache-and-query for wide area sensor databases , 2003, SIGMOD '03.

[5]  Jeffrey F. Naughton,et al.  Evaluating window joins over unbounded streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[6]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[7]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[8]  Paul G. Spirakis,et al.  NanoPeer networks and P2P worlds , 2003, Proceedings Third International Conference on Peer-to-Peer Computing (P2P2003).

[9]  Ambuj K. Singh,et al.  SWAT: hierarchical stream summarization in large networks , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[11]  Abhinandan Das,et al.  Approximate join processing over data streams , 2003, SIGMOD '03.

[12]  Scott Shenker,et al.  Internet indirection infrastructure , 2002, SIGCOMM 2002.

[13]  James H. Burrows,et al.  Secure Hash Standard , 1995 .

[14]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[15]  Ambuj K. Singh,et al.  A unified framework for monitoring data streams in real time , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[17]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[18]  Samuel Madden,et al.  Fjording the stream: an architecture for queries over streaming sensor data , 2002, Proceedings 18th International Conference on Data Engineering.

[19]  Ben Y. Zhao,et al.  Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and , 2001 .

[20]  Anne Rogers,et al.  Hancock: a language for extracting signatures from data streams , 2000, KDD '00.

[21]  Bobby Bhattacharjee,et al.  Scalable application layer multicast , 2002, SIGCOMM '02.

[22]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[23]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[24]  Indranil Gupta,et al.  Scalable fault-tolerant aggregation in large process groups , 2001, 2001 International Conference on Dependable Systems and Networks.

[25]  Dina Q. Goldin,et al.  On Similarity Queries for Time-Series Data: Constraint Specification and Implementation , 1995, CP.

[26]  Deborah Estrin,et al.  GHT: a geographic hash table for data-centric storage , 2002, WSNA '02.

[27]  Deborah Estrin,et al.  DIFS: a distributed index for features in sensor networks , 2003, Ad Hoc Networks.

[28]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[29]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.