Resource allocation in a middleware for streaming data

Increasingly, a number of applications rely on, or can potentially benefit from, analysis and monitoring of <i>data streams.</i> To support processing of streaming data in a grid environment, we have been developing a middleware system called GATES (Grid-based AdapTive Execution on Streams). Our target applications are those involving high volume data streams and requiring distributed processing of data arising from a distributed set of sources. This paper addresses the problem of resource allocation in the GATES system. Though resource discovery and resource allocation have been active topics in grid community, the pipelined processing and real-time constraint required by distributed streaming applications pose new challenges. We present a resource allocation algorithm that is based on minimal spanning trees. We evaluate the algorithm experimentally and demonstrate that it results in configurations that are very close to optimal, and significantly better than most other possible configurations.

[1]  Luc Moreau Agents for the Grid: A Comparison for Web Services (Part 1: the transport layer) , 2002 .

[2]  D. Estrin,et al.  RSVP: a new resource reservation protocol , 2001 .

[3]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[4]  Steven Tuecke,et al.  Grid Services for Distributed System , 2002 .

[5]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[6]  Beth Plale Leveraging run time knowledge about event rates to improve memory utilization in wide area data stream filtering , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[7]  Luc Moreau,et al.  Agents for the Grid: A Comparison with Web Services (Part I: Transport Layer) , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[8]  Klara Nahrstedt,et al.  QoS-aware discovery of wide-area distributed services , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[9]  Liang Chen,et al.  GATES: a grid-based middleware for processing distributed data streams , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[10]  John F. Karpovich,et al.  Resource management in Legion , 1999, Future Gener. Comput. Syst..

[11]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[12]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[13]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[14]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[15]  Jeffrey F. Naughton,et al.  Rate-based query optimization for streaming information sources , 2002, SIGMOD '02.

[16]  Ian Foster,et al.  A peer-to-peer approach to resource location in grid environments , 2002 .

[17]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[18]  Ladislau Bölöni,et al.  Agent-based resource discovery , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[19]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[20]  Karsten Schwan,et al.  Dynamic Querying of Streaming Data with the dQUOB System , 2003, IEEE Trans. Parallel Distributed Syst..

[21]  Riccardo Bettati,et al.  Dynamic resource discovery for applications survivability in distributed real-time systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[22]  Jennifer Widom,et al.  An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations , 2002 .

[23]  Kenneth A. Hawick,et al.  Resource discovery for dynamic clusters in computational grids , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[24]  Harvey B Newman,et al.  Data‐Intensive Grids for High‐Energy Physics , 2003 .

[25]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[26]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.