GATES: a grid-based middleware for processing distributed data streams

Increasingly, a number of applications rely on, or can potentially benefit from, analysis and monitoring of data streams. Moreover, many of these applications involve high volume data streams and require distributed processing of data arising from a distributed set of sources. Thus, we believe that a grid environment is well suited for flexible and adaptive analysis of these streams. This paper reports the design and initial evaluation of a middleware for processing distributed data streams. Our system is referred to as GATES (grid-based adaptive execution on streams). This system is designed to use the existing grid standards and tools to the extent possible. It flexibly achieves the best accuracy that is possible while maintaining the real-time constraint on the analysis. We have developed a self-adaptation algorithm for this purpose. Results from a detailed evaluation of this system demonstrate the benefits of distributed processing, and the effectiveness of our self-adaptation algorithm.

[1]  Michael Stonebraker,et al.  High-availability algorithms for distributed stream processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[2]  Kenneth A. Hawick,et al.  Resource discovery for dynamic clusters in computational grids , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[3]  Vikram S. Adve,et al.  Program Control Language: a programming language for adaptive distributed applications , 2003, J. Parallel Distributed Comput..

[4]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[5]  Christian Poellabauer,et al.  Cooperative run-time management of adaptive applications and distributed resources , 2002, MULTIMEDIA '02.

[6]  Francine Berman,et al.  Overview of the Book: Grid Computing – Making the Global Infrastructure a Reality , 2003 .

[7]  Klara Nahrstedt,et al.  A control-based middleware framework for quality-of-service adaptations , 1999, IEEE J. Sel. Areas Commun..

[8]  Tore Risch,et al.  High-performance GRID Database Manager for Scientific Data , 2002, WDAS.

[9]  Nick Roussopoulos,et al.  Hierarchical In-Network Data Aggregation with Quality Guarantees , 2004, EDBT.

[10]  Vikram S. Adve,et al.  Language and Compiler Support for Adaptive Distributed Applications , 2001, LCTES/OM.

[11]  Tamer Basar,et al.  A game-theoretic formulation of multi-agent resource allocation , 2000, AGENTS '00.

[12]  Thomas Ertl,et al.  Level-of-Detail Volume Rendering via 3D Textures , 2000, 2000 IEEE Symposium on Volume Visualization (VV 2000).

[13]  Luc Moreau Agents for the Grid: A Comparison for Web Services (Part 1: the transport layer) , 2002 .

[14]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[15]  Sathish S. Vadhiyar,et al.  SRS: A Framework for Developing Malleable and Migratable Parallel Applications for Distributed Systems , 2003, Parallel Process. Lett..

[16]  Srinivasan Seshan,et al.  Synopsis diffusion for robust aggregation in sensor networks , 2004, SenSys '04.

[17]  Jonathan Walpole,et al.  A user-level process package for PVM , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[18]  Karsten Schwan,et al.  Dynamic Querying of Streaming Data with the dQUOB System , 2003, IEEE Trans. Parallel Distributed Syst..

[19]  Han-Wei Shen,et al.  Hardware Accelerated Interactive Vector Field Visualization: A level of detail approach , 2002, Comput. Graph. Forum.

[20]  Tore Risch,et al.  High-Performance GRID Stream Database Manager for Scientific Data , 2003, European Across Grids Conference.

[21]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[22]  Harvey B Newman,et al.  Data‐Intensive Grids for High‐Energy Physics , 2003 .

[23]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[24]  Tommy Thorne,et al.  Programming languages for mobile code , 1997, CSUR.

[25]  Brian Harrington,et al.  In-network surface simplification for sensor fields , 2005, GIS '05.

[26]  Songwu Lu,et al.  The TIMELY adaptive resource management architecture , 1998, IEEE Wirel. Commun..

[27]  Edward Y. Chang,et al.  Adaptive stream resource management using Kalman Filters , 2004, SIGMOD '04.

[28]  Sunil Prabhakar,et al.  Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance , 2005, VLDB.

[29]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[30]  Ian Foster,et al.  Cactus-g toolkit: supporting efficient execution in heterogeneous distributed computing environments , 2000 .

[31]  Robert D. Bjornson Linda on distributed memory multiprocessors , 1993 .

[32]  Richard Wolski,et al.  Forecasting network performance to support dynamic scheduling using the network weather service , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[33]  John Rohlf,et al.  IRIS performer: a high performance multiprocessing toolkit for real-time 3D graphics , 1994, SIGGRAPH.

[34]  Jeffrey F. Naughton,et al.  Rate-based query optimization for streaming information sources , 2002, SIGMOD '02.

[35]  Pedro C. Diniz,et al.  Selector: A Language Construct for Developing Dynamic Applications , 2002, LCPC.

[36]  Laxmikant V. Kalé,et al.  Performance evaluation of adaptive MPI , 2006, PPoPP '06.

[37]  James M. Rehg,et al.  Stampede: A Cluster Programming Middleware for Interactive Stream-Oriented Applications , 2003, IEEE Trans. Parallel Distributed Syst..

[38]  Margo I. Seltzer,et al.  Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[39]  Peter L. Reiher,et al.  Conductor: a framework for distributed adaptation , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[40]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[41]  Vijay Karamcheti,et al.  Partitionable services: A framework for seamlessly adapting distributed applications to heterogeneous environments , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[42]  Ian T. Foster,et al.  Grid Services for Distributed System Integration , 2002, Computer.

[43]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[44]  Matt Welsh,et al.  Hourglass: An Infrastructure for Connecting Sensor Networks and Applications , 2004 .

[45]  Matt Welsh,et al.  Path Optimization in Stream-Based Overlay Networks , 2004 .

[46]  Donald F. Ferguson,et al.  From Open Grid Services Infrastructure to WS-Resource Framework: Refactoring and Evolution , 2004 .

[47]  Robert L. Grossman,et al.  Merging Multiple Data Streams on Common Keys over High Performance Networks , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[48]  Rodger Lea,et al.  DART: A Reflective Middleware for Adaptive Applications , 1998 .

[49]  Jaideep Srivastava,et al.  Mining for Network Intrusion Detection , 2002 .

[50]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[51]  David Abramson,et al.  A flexible IO scheme for grid workflows , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[52]  Riccardo Bettati,et al.  Dynamic resource discovery for applications survivability in distributed real-time systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[53]  Enrico Gobbetti,et al.  Time-critical multiresolution scene rendering , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[54]  Jennifer Widom,et al.  An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations , 2002 .

[55]  Karsten Schwan,et al.  ACDS: Adapting computational data streams for high performance , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[56]  Han-Wei Shen,et al.  Adaptive Volume Rendering using Fuzzy Logic Control , 2001, VisSym.

[57]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[58]  Joel H. Saltz,et al.  Optimizing execution of component-based applications using group instances , 2002, Future Gener. Comput. Syst..

[59]  David Gelernter,et al.  Supercomputing out of recycled garbage: preliminary experience with Piranha , 1992, ICS '92.

[60]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[61]  Christopher R. Johnson,et al.  The SCIRun Computational Steering Software System , 1997, SciTools.

[62]  Francine Berman,et al.  The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[63]  Klara Nahrstedt,et al.  QoS-aware discovery of wide-area distributed services , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[64]  Christopher Olston,et al.  Finding (recently) frequent items in distributed data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[65]  Joel H. Saltz,et al.  A Component-based Implementation of Iso-surface Rendering for Visualizing Large Datasets , 2001 .

[66]  Alan Sussman,et al.  A high performance multi-perspective vision studio , 2003, ICS '03.

[67]  Michael Stonebraker,et al.  The Aurora and Medusa Projects , 2003, IEEE Data Eng. Bull..

[68]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[69]  Douglas C. Schmidt,et al.  Issues in the Design of Adaptive Middleware Load Balancing , 2001, OM '01.

[70]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[71]  Dennis Gannon,et al.  Checkpoint and restart for distributed components in XCAT3 , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[72]  Beth Plale Using Global Snapshots to Access Data Streams on the Grid , 2004, European Across Grids Conference.

[73]  Han-Wei Shen,et al.  Time-critical multiresolution volume rendering using 3D texture mapping hardware , 2002, Symposium on Volume Visualization and Graphics, 2002. Proceedings. IEEE / ACM SIGGRAPH.

[74]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[75]  Ian T. Foster,et al.  State and events for Web services: a comparison of five WS-resource framework and WS-notification implementations , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[76]  Beth Plale Leveraging run time knowledge about event rates to improve memory utilization in wide area data stream filtering , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[77]  Karsten Schwan,et al.  On adaptive resource allocation for complex real-time applications , 1997, Proceedings Real-Time Systems Symposium.

[78]  Fangzhe Chang,et al.  Automatic configuration and run-time adaptation of distributed applications , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[79]  David Ellsworth,et al.  Accelerating Time-Varying Hardware Volume Rendering Using TSP Trees and Color-Based Error Metrics , 2000, 2000 IEEE Symposium on Volume Visualization (VV 2000).

[80]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[81]  Jaideep Srivastava,et al.  Data Mining for Network Intrusion Detection , 2002 .

[82]  Shoji Kurakake,et al.  Roam, a seamless application framework , 2004, J. Syst. Softw..

[83]  Benjamin Watson,et al.  Managing Level of Detail in Virtual Environments: A Perceptual Framework , 1997, Presence: Teleoperators & Virtual Environments.

[84]  Aaron Kershenbaum,et al.  Mobile Agents: Are They a Good Idea? , 1996, Mobile Object Systems.

[85]  Liang Chen,et al.  Resource allocation in a middleware for streaming data , 2004, MGC '04.

[86]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[87]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[88]  Deborah Estrin,et al.  Impact of network density on data aggregation in wireless sensor networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[89]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[90]  Gustavo Alonso,et al.  Providing High Availability in Very Large Worklflow Management Systems , 1996, EDBT.

[91]  Tore Risch,et al.  Customizable Parallel Execution of Scientific Stream Queries , 2005, VLDB.

[92]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[93]  Ladislau Bölöni,et al.  Agent-based resource discovery , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[94]  Henri Casanova,et al.  NetSovle: A Network Server for Solving Computational Science Problems , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[95]  Yong Yao,et al.  The cougar approach to in-network query processing in sensor networks , 2002, SGMD.

[96]  D. Estrin,et al.  RSVP: a new resource reservation protocol , 1993, IEEE Communications Magazine.

[97]  Jonathan Walpole,et al.  Adaptive load migration systems for PVM , 1994, Proceedings of Supercomputing '94.

[98]  Gagan Agrawal,et al.  Language and Compiler Support for Adaptive Applications , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[99]  Rajeev Motwani,et al.  Load Shedding Techniques for Data Stream Systems , 2003 .

[100]  Somesh Jha,et al.  Agent cloning: an approach to agent mobility and resource allocation , 1998 .

[101]  Calton Pu,et al.  A feedback-driven proportion allocator for real-rate scheduling , 1999, OSDI '99.

[102]  James M. Rehg,et al.  Space-time memory: a parallel programming abstraction for interactive multimedia applications , 1999, PPoPP '99.

[103]  Mahadev Satyanarayanan,et al.  Agile application-aware adaptation for mobility , 1997, SOSP.

[104]  Renato Cerqueira,et al.  Dynamic support for distributed auto-adaptive applications , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[105]  John F. Karpovich,et al.  Resource management in Legion , 1999, Future Gener. Comput. Syst..

[106]  Georg Stellner,et al.  CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.

[107]  Adam Arbree,et al.  Mapping Abstract Complex Workflows onto Grid Environments , 2003, Journal of Grid Computing.

[108]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[109]  Johannes Gehrke,et al.  Query Processing in Sensor Networks , 2003, CIDR.

[110]  Bradley R. Schmerl,et al.  Software architecture-based adaptation for Grid computing , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[111]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[112]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[113]  David Abramson,et al.  Nimrod: a tool for performing parametrised simulations using distributed workstations , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.