A High-Performance , Hierarchical Decoupled Architecture 1

This paper presents a novel, high-performance decoupled architecture called the HiDISC 1 (Hierarchical Decoupled Instruction Stream Computer). The HiDISC provides high performance for loop-based scientific programs by exploiting instruction-level parallelism and improving memory system peformance by providing decoupled prefetching. In this paper, we present the HiDISC architeture, a sample program to show how the architecture works, and simulation results for nine scientific benchmarks and one symbolic benchmark. The performance advantage of the HiDISC architecture increases as the miss penalty gets larger relative to processor cycles, making it an attractive architecture as the difference between processor speed and DRAM speed grows exponentially.

[1]  A. Gupta,et al.  Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.

[2]  Gary S. Tyson,et al.  MISC: a Multiple Instruction Stream Computer , 1992, MICRO 25.

[3]  Monica S. Lam,et al.  RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .

[4]  Wm. A. Wulf Evaluation of the WM architecture , 1992, ISCA '92.

[5]  James E. Smith,et al.  Dynamic instruction scheduling and the Astronautics ZS-1 , 1989, Computer.

[6]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .

[7]  Andrew R. Pleszkun,et al.  PIPE: a VLSI decoupled architecture , 1985, ISCA '85.

[8]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[9]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[10]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[11]  Lizy Kurian John,et al.  Memory Latency Effects in Decoupled Architectures , 1994, IEEE Trans. Computers.

[12]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[13]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[14]  Apoorv Srivastava,et al.  A High-Performance, Hierarchical Decoupled Architecture , 1996 .

[15]  Lizy Kurian John,et al.  Program balance and its impact on high performance RISC architectures , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[16]  Sigarch The 16th Annual International Symposium on Computer Architecture , 1989 .

[17]  D. Munson Circuits and systems , 1982, Proceedings of the IEEE.

[18]  Ian Watson,et al.  Decoupled pre-fetching for distributed shared memory , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[19]  Andrew R. Pleszkun,et al.  Structured Memory Access Architecture , 1983, ICPP.