Experience with executing shared memory programs using fine-grain communication and multithreading in EM-4

We present our experience and results obtained from executing shared memory application programs using fine-grain remote memory access communication and multithreading in the EM-4 multiprocessor. The EM-4 is a distributed memory multiprocessor which has a dataflow mechanism. The dataflow mechanism enables a fine-grain communication packet through the network to invoke the thread of control dynamically with very small overhead and is extended to access remote memory in different processors. We hide the remote memory access latencies with multithreading. The benchmark results show that shared memory applications achieve reasonable speedup with four to eight threads in the EM-4 prototype. We found that aggressive multithreading can negatively affect its network interface and increase the network contention. We also describe the EM-4 parallel programming language called EM-C, which provides the notion of a global address space and parallel constructs for exploiting medium-grain parallelism to tolerate several remote operation latencies.<<ETX>>