Supporting self-adaptation in streaming data mining applications

There are many application classes where the users are flexible with respect to the output quality. At the same time, there are other constraints, such as the need for real-time or interactive response, which are more crucial. This paper presents and evaluates a runtime algorithm for supporting adaptive execution for such applications. The particular domain we target is distributed data mining on streaming data. This work has been done in the context of a middleware system called GATES (grid-based adaptive execution on streams) that we have been developing. The self-adaptation algorithm we present and evaluate in this paper has the following characteristics. First, it carefully evaluates the long-term load at each processing stage. It considers different possibilities for the load at a processing stage and its next stages, and decides if the value of an adaptation parameter needs to be modified, and if so, in which direction. To find the ideal new value of an adaptation parameter, it performs a binary search on the specified range of the parameter. To evaluate the self-adaptation algorithm in our middleware, we have implemented two streaming data mining applications. The main observations from our experiments are as follows. First, our algorithm is able to quickly converge to stable values of the adaptation parameter, for different data arrival rates, and independent of the specified initial value. Second, in a dynamic environment, the algorithm is able to adapt the processing rapidly. Finally, in both static and dynamic environments, the algorithm clearly outperforms the algorithm described in our earlier work and an obvious alternative, which is based on linear-updates.

[1]  Gagan Agrawal,et al.  Language and Compiler Support for Adaptive Applications , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[2]  Fangzhe Chang,et al.  Automatic configuration and run-time adaptation of distributed applications , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[3]  Jaideep Srivastava,et al.  Data Mining for Network Intrusion Detection , 2002 .

[4]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[5]  Francine Berman,et al.  The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[6]  Christopher Olston,et al.  Finding (recently) frequent items in distributed data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[7]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[8]  Karsten Schwan,et al.  ACDS: Adapting computational data streams for high performance , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[9]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[10]  Calton Pu,et al.  A feedback-driven proportion allocator for real-rate scheduling , 1999, OSDI '99.

[11]  James M. Rehg,et al.  Space-time memory: a parallel programming abstraction for interactive multimedia applications , 1999, PPoPP '99.

[12]  Mahadev Satyanarayanan,et al.  Agile application-aware adaptation for mobility , 1997, SOSP.

[13]  Renato Cerqueira,et al.  Dynamic support for distributed auto-adaptive applications , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[14]  Jennifer Widom,et al.  An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations , 2002 .

[15]  Rodger Lea,et al.  DART: A Reflective Middleware for Adaptive Applications , 1998 .

[16]  Peter L. Reiher,et al.  Conductor: a framework for distributed adaptation , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[17]  Jeffrey F. Naughton,et al.  Rate-based query optimization for streaming information sources , 2002, SIGMOD '02.

[18]  Robert L. Grossman,et al.  Merging Multiple Data Streams on Common Keys over High Performance Networks , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[19]  Pedro C. Diniz,et al.  Selector: A Language Construct for Developing Dynamic Applications , 2002, LCPC.

[20]  Vijay Karamcheti,et al.  Partitionable services: A framework for seamlessly adapting distributed applications to heterogeneous environments , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[21]  Karsten Schwan,et al.  Dynamic Querying of Streaming Data with the dQUOB System , 2003, IEEE Trans. Parallel Distributed Syst..

[22]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[23]  Beth Plale Leveraging run time knowledge about event rates to improve memory utilization in wide area data stream filtering , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[24]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[25]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[26]  Han-Wei Shen,et al.  Time-critical multiresolution volume rendering using 3D texture mapping hardware , 2002, Symposium on Volume Visualization and Graphics, 2002. Proceedings. IEEE / ACM SIGGRAPH.

[27]  Alan Sussman,et al.  A high performance multi-perspective vision studio , 2003, ICS '03.

[28]  Douglas C. Schmidt,et al.  Issues in the Design of Adaptive Middleware Load Balancing , 2001, OM '01.

[29]  Francine Berman,et al.  Overview of the Book: Grid Computing – Making the Global Infrastructure a Reality , 2003 .

[30]  Shoji Kurakake,et al.  Roam, a seamless application framework , 2004, J. Syst. Softw..

[31]  G. Allen,et al.  Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[32]  James M. Rehg,et al.  Stampede: A Cluster Programming Middleware for Interactive Stream-Oriented Applications , 2003, IEEE Trans. Parallel Distributed Syst..

[33]  Vikram S. Adve,et al.  Program Control Language: a programming language for adaptive distributed applications , 2003, J. Parallel Distributed Comput..

[34]  Christian Poellabauer,et al.  Cooperative run-time management of adaptive applications and distributed resources , 2002, MULTIMEDIA '02.

[35]  Ian Foster,et al.  Cactus-g toolkit: supporting efficient execution in heterogeneous distributed computing environments , 2000 .

[36]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[37]  Songwu Lu,et al.  The TIMELY adaptive resource management architecture , 1998, IEEE Wirel. Commun..

[38]  Miron Livny,et al.  Condor and the Grid , 2003 .

[39]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[40]  Liang Chen,et al.  GATES: a grid-based middleware for processing distributed data streams , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..