A Reconfigurable Run-Time System for Filter-Stream Applications

The development of high level abstractions for programming distributed systems is becoming a crucial effort in computer science. Several frameworks have been proposed, which expose simplified programming abstractions that are useful for a broad class of applications and can be implemented efficiently on distributed systems. One such system is Anthill, based on the filter-stream programming model, in which applications are decomposed into sets of independent filters that communicate via streams. Anthill achieves high performance by allowing filters to be transparently replicated across several compute nodes.In this paper we present a global state manager for Anthill, which exports a simple abstraction to manipulate state variables for application filters. The state is distributed transparently among the instances of that filter, and our manager is designed to allow data migration from one filter instance to another, enabling Anthill to dynamically reconfigure applications at execution time.To evaluate our system, we used two well known data mining algorithms: a priori and k-means. Our results show that the framework incurs low overhead, 1.8% on average, and that the resulting system can effectively make use of new resources as they are made available, with execution times 3.57% slower, on average, than the minimum expected time for the reconfiguration scenario.

[1]  Laxmikant V. Kalé,et al.  Performance evaluation of adaptive MPI , 2006, PPoPP '06.

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[4]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[5]  Roy Friedman,et al.  Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[6]  Lúcia Maria de A. Drummond,et al.  Anthill: a scalable run-time environment for data mining applications , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).

[7]  Tony Pan,et al.  XML database support for distributed execution of data-intensive scientific workflows , 2005, SGMD.

[8]  Antony I. T. Rowstron,et al.  An Efficient Distributed Tuple Space Implementation for Networks of Workstations , 1996, Euro-Par, Vol. I.

[9]  Joel H. Saltz,et al.  DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems , 2000, IEEE Symposium on Mass Storage Systems.

[10]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[11]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[12]  Sathish S. Vadhiyar,et al.  SRS: A Framework for Developing Malleable and Migratable Parallel Applications for Distributed Systems , 2003, Parallel Process. Lett..

[13]  Boleslaw K. Szymanski,et al.  Dynamic Malleability in Iterative MPI Applications , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[14]  Srinivasan Parthasarathy,et al.  Asynchronous and Anticipatory Filter-Stream Based Parallel Algorithm for Frequent Itemset Mining , 2004, PKDD.