Empirical study of latency hiding on a fine-grain parallel processor

Latency associated with memory accesses and process communications are one of the most difficult obstacles in constructing a practical massively parallel system. So far, two approaches to hide latencies have been proposed. They are prefetching and multi-threading. An instruction-level data-driven computer is an ideal test-bed for evaluating these latency hiding methods because prefetching and multi-threading are naturally implemented in an instruction-level data-driven computer as unfolding and concurrent execution of multiple contexts. This paper evaluates latency hiding methods on SIGMA-1, a dataflow supercomputer developed in Electrotechnical Laboratory. As a result of evaluation, these methods are effective to hide static latencies but not effective to hide dynamic latencies. Also, concurrent execution of multiple contexts is more effective than prefetching.

[1]  Toshitsugu Yuba,et al.  An Architecture Of A Dataflow Single Chip Processor , 1989, The 16th Annual International Symposium on Computer Architecture.

[2]  A. Gupta,et al.  Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.

[3]  Alan Jay Smith,et al.  A class of compatible cache consistency protocols and their support by the IEEE futurebus , 1986, ISCA '86.

[4]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[5]  Arvind,et al.  A critique of multiprocessing von Neumann style , 1983, ISCA '83.

[6]  Satoshi Sekiguchi,et al.  Sequential description and parallel execution language DFCII dataflow supercomputers , 1991, ICS '91.

[7]  Satoshi Sekiguchi,et al.  Performance Evaluation of the Dataflow Computer SIGMA - 1 , 1993 .

[8]  Anoop Gupta,et al.  Exploring The Benefits Of Multiple Hardware Contexts In A Multiprocessor Architecture: Preliminary Results , 1989, The 16th Annual International Symposium on Computer Architecture.

[9]  Anoop Gupta,et al.  Comparative evaluation of latency reducing and tolerating techniques , 1991, ISCA '91.

[10]  Robert A. Iannucci Toward a dataflow/von Neumann hybrid architecture , 1988, ISCA '88.

[11]  Kenji Nishida,et al.  A hardware design of the SIGMA-1, a data flow computer for scientific computations , 1986 .

[12]  Bob Boothe,et al.  Improved multithreading techniques for hiding communication latency in multiprocessors , 1992, ISCA '92.

[13]  Alan Jay Smith,et al.  A class of compatible cache consistency protocols and their support by the IEEE futurebus , 1986, ISCA '86.

[14]  Ian Watson,et al.  The Manchester prototype dataflow computer , 1985, CACM.