The EM-X parallel computer: architecture and basic performance

Latency tolerance is essential in achieving high performance on parallel computers for remote function calls and fine-grained remote memory accesses. EM-X supports interprocessor communication on an execution pipeline with small and simple packets. It can create a packet in one cycle, and receive a packet from the network in the on-chip buffer without interruption. EM-X invokes threads on packet arrival, minimizing the overhead of thread switching. It can tolerate communication latency by using efficient multi-threading and optimizing packet flow of fine grain communication. EM-X also supports the synchronization of two operands, direct remote memory read/write operations and flexible packet scheduling with priority. This paper describes distinctive features of the EM-X architecture and reports the performance of small synthetic programs and larger more realistic programs.

[1]  Mitsuhisa Sato,et al.  EMC-Y: parallel processing element optimizing communication and computation , 1993, ICS '93.

[2]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[3]  Arvind,et al.  T: a multithreaded massively parallel architecture , 1992, ISCA '92.

[4]  T. Yuba,et al.  An architecture of a dataflow single chip processor , 1989, ISCA '89.

[5]  Michael D. Noakes,et al.  The J-machine multicomputer: an architectural evaluation , 1993, ISCA '93.

[6]  Yoichi Koyanagi,et al.  AP1000+: architectural support of PUT/GET interface for parallelizing compiler , 1994, ASPLOS VI.

[7]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[8]  Keshav Pingali,et al.  I-structures: Data structures for parallel computing , 1986, Graph Reduction.

[9]  Shuichi Sakai,et al.  An Architectural Disgn of a Highly Parallel Dataflow Machine , 1989, IFIP Congress.

[10]  Shuichi Sakai,et al.  Design and Implementation of a Circular Omega Network in the EM-4 , 1993, Parallel Comput..

[11]  Mitsuhisa Sato,et al.  Message-based efficient remote memory access on a highly parallel computer EM-X , 1994, Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN).

[12]  Mitsuhisa Sato,et al.  Thread-based programming for the EM-4 hybrid dataflow machine , 1992, ISCA '92.

[13]  Mitsuhisa Sato,et al.  Performance of data-parallel primitives on the EM-4 dataflow parallel supercomputer , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[14]  Mitsuhisa Sato,et al.  Experience with executing shared memory programs using fine-grain communication and multithreading in EM-4 , 1994, Proceedings of 8th International Parallel Processing Symposium.