MULTITHREADING AND THREAD MIGRATION USING MPI AND MYRINET

The balance between CPU speed and interconnection network throughput in distributed memory parallel computers varies with each generation of systems, but the trend is that CPUs are gaining performance faster than the interconnection networks. This means that remote data accesses are becoming more expensive relative to local accesses in terms of CPU cycles. Therefore, remote memory access mechanisms that were suited to a previous generation of parallel machines may be less appropriate for current clusters. This research evaluates a multithreaded programming paradigm with cached remote memory accesses and thread migration to exploit array locality on a cluster with Myrinet. The approach, called Nomadic Threads, was originally developed for the CM5, but has been adapted to use MPI on Linux clusters. The results show that the current surfeit of CPU power vs. network throughput dramatically changes scaling characteristics of some programs while others behave much as they did on the decade-old CM5.

[1]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[2]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[3]  Guang R. Gao,et al.  Advanced topics in dataflow computing and multithreading , 1994 .

[4]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[5]  Anne Rogers,et al.  Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.

[6]  Ricardo Bianchini,et al.  Limits on the performance benefits of multithreading and prefetching , 1996, SIGMETRICS '96.

[7]  Guang R. Gao,et al.  Multithreaded Architectures: Principles, Projects, and Issues , 1994, Multithreaded Computer Architecture.

[8]  Jean-Luc Gaudiot,et al.  A Multithreaded Runtime System With Thread Migration for Distributed Memory Parallel Computing , 2003 .

[9]  John Glauert,et al.  SISAL: streams and iteration in a single-assignment language. Language reference manual, Version 1. 1 , 1983 .

[10]  Jean-Luc Gaudiot,et al.  An evaluation of thread migration for exploiting distributed array locality , 2002, Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications.

[11]  Dean M. Tullsen,et al.  Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.

[12]  Jack Dongarra,et al.  MPI - The Complete Reference: Volume 1, The MPI Core , 1998 .

[13]  Frederica Darema,et al.  A single-program-multiple-data computational model for EPEX/FORTRAN , 1988, Parallel Comput..

[14]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.