Dynamic Characteristics of Multithreaded Execution in the EM-X Multiprocessor

Multithreading is known be e ective for tolerating communication latency in distributed-memory multiprocessors. Two types of support for multithreading have been used to date including software and hardware. This paper presents the impact of multithreading on performance through empirical studies. In particular, we explicate the performance di erence between software support and hardware support for the 80-processor EM-X distributed-memory multiprocessor which we have designed and implemented. The EMX provides three types of hardware supports for ne-grain multithreading including direct remote memory access, fast thread invocation, and dedicated instructions for generating xed-sized communication packets. To demonstrate the e ect of multithreading, we have performed various experiments using micro benchmark programs and MP3D, one of the SPLASH benchmarks. Three types of performance parameters have been measured including processor e ciency, remote memory latency, and network load. Experimental results indicate that the EM-X architecture is highly e ective for supporting the multithreading principles of execution through dedicated hardware and software. keywords Multithreading, latency hiding, ne grain communication, direct remote memory access, shared memory benchmark, synthetic workload.

[1]  Mitsuhisa Sato,et al.  Multithreading with the EM-4 distributed-memory multiprocessor , 1995, PACT.

[2]  Shuichi Sakai,et al.  Design and Implementation of a Circular Omega Network in the EM-4 , 1993, Parallel Comput..

[3]  Mitsuhisa Sato,et al.  The EM-X parallel computer: architecture and basic performance , 1995, ISCA.

[4]  Mitsuhisa Sato,et al.  Experience with executing shared memory programs using fine-grain communication and multithreading in EM-4 , 1994, Proceedings of 8th International Parallel Processing Symposium.

[5]  Mitsuhisa Sato,et al.  Thread-based programming for the EM-4 hybrid dataflow machine , 1992, ISCA '92.

[6]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[7]  Bob Boothe,et al.  Improved multithreading techniques for hiding communication latency in multiprocessors , 1992, ISCA '92.

[8]  Dean M. Tullsen,et al.  Limitations Of Cache Prefetching On A Bus-based Multiprocessor , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[9]  Arvind,et al.  T: A Multithreaded Massively Parallel Architecture , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[10]  A. Gupta,et al.  Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.

[11]  Mitsuhisa Sato,et al.  Message-based efficient remote memory access on a highly parallel computer EM-X , 1994, Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN).