EMC-Y: parallel processing element optimizing communication and computation

EMC-Y is a new processing element for highly parallel computers designed to achieve high performance parallel computation by fusing a dataflow mechanism and a von Neumann execution pipeline. We have already developed EMC-R, which is the processing element used in the EM-4 prototype. EMC-Y improves on EMC-R's packet communication performance, allowing it to tolerate a more network traffic. This paper presents the architecture of EMC-Y, concentrating on the principles of packet communication. EMC-Y uses an output packet buffer and optimal packet routing to improve the performance of packet sending and transferring. EMC-Y changes the memory access priority for input packet buffer operation to improve the performance of receiving packets. Since the EMC-Y processor not only improves the performance of packet input and output but also balances them, it can tolerate a large amount of traffic and can improve the execution performance. We evaluate the improvements of EMC-Y architecture using a clock level simulator. The results show that EMC-Y improves performance by 50% to 70% in several programs over EMC-R at the same clock speed.