Self-Adaptation of Parallel Applications in Heterogeneous and Dynamic Architectures

In this paper a mechanism for adaptation of parallel computation is defined for data flow computations in dynamic and heterogeneous environments. Our mechanism is especially useful in massively parallel multi-threaded computations as found in cluster or grid computing. By basing the state of executions on a data flow graph, this approch shows extreme flexibility with respect to adaptation of parallel computation induced by application. This adaptation reflects needs for changing runtime behavior due to time observable parameters. Specifically, it allows an on-line adaptation of parallel execution in dynamic heterogeneous systems. We have implemented this mechnism in KAAPI (Kernel for Adaptative and Asynchronous Parallel Interface) and experimental results show the overhead induced is small.

[1]  S. Jafar,et al.  Theft-induced checkpointing for reconfigurable dataflow applications , 2005, 2005 IEEE International Conference on Electro Information Technology.

[2]  Jean-Louis Pazat,et al.  Dynamic Adaptation for Grid Computing , 2005, EGC.

[3]  Zizhong Chen,et al.  Self-adapting software for numerical linear algebra and LAPACK for clusters , 2003, Parallel Comput..

[4]  Pattie Maes,et al.  Concepts and experiments in computational reflection , 1987, OOPSLA '87.

[5]  Victor Eijkhout,et al.  Self-Adapting Numerical Software for Next Generation Applications , 2003, Int. J. High Perform. Comput. Appl..

[6]  Anh Nguyen-Tuong,et al.  Exploiting data-flow for fault-tolerance in a wide-area parallel system , 1996, Proceedings 15th Symposium on Reliable Distributed Systems.

[7]  Theo Ungerer,et al.  Asynchrony in Parallel Computing: From Dataflow to Multithreading , 2001, Scalable Comput. Pract. Exp..

[8]  Victor Eijkhout,et al.  Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.

[9]  Sathish S. Vadhiyar,et al.  Self adaptivity in Grid computing , 2005, Concurr. Pract. Exp..

[10]  Axel W. Krings,et al.  A Checkpoint/Recovery Model for Heterogeneous Dataflow Computations Using Work-Stealing , 2005, Euro-Par.

[11]  Andrew S. Grimshaw,et al.  Exploiting Data-Flow for Fault-Tolerance in a Wide-Area Parallel System , 1996, SRDS.

[12]  Sathish S. Vadhiyar,et al.  Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[13]  Gerson G. H. Cavalheiro,et al.  Athapascan-1: On-line building data flow graph in a parallel language , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).