Using a serial cache for energy efficient instruction fetching

The design of a high performance fetch architecture can be challenging due to poor interconnect scaling and energy concerns. Way prediction has been presented as one means of scaling the fetch engine to shorter cycle times, while providing energy efficient instruction cache accesses. However, way prediction requires additional complexity to handle mispredictions.In this paper, we examine a high-bandwidth fetch architecture augmented with an instruction cache way predictor. We compare the performance and energy efficiency of this architecture to both a serial access cache and a parallel access cache. Our results show that a serial fetch architecture achieves approximately the same energy reduction and performance as way prediction architectures, without the added structures and recovery complexity needed for way prediction.

[1]  R. Ronen,et al.  Micro-operation cache: a power aware frontend for variable instruction length ISA , 2001, ISLPED'01: Proceedings of the 2001 International Symposium on Low Power Electronics and Design (IEEE Cat. No.01TH8581).

[2]  Glenn Reinman,et al.  High Performance and Energy Efficient Serial Prefetch Architecture , 2002, ISHPC.

[3]  Glenn Reinman,et al.  Optimizations Enabled by a Decoupled Front-End Architecture , 2001, IEEE Trans. Computers.

[4]  Dirk Grunwald,et al.  Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[5]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[6]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[7]  Bob McNamara,et al.  Neon: a single-chip 3D workstation graphics accelerator , 1998, Workshop on Graphics Hardware.

[8]  Dirk Grunwald,et al.  Next cache line and set prediction , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[9]  S. McFarling Combining Branch Predictors , 1993 .

[10]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[11]  Homan Igehy,et al.  Prefetching in a texture cache architecture , 1998, Workshop on Graphics Hardware.

[12]  Glenn Reinman,et al.  A scalable front-end architecture for fast instruction delivery , 1999, ISCA.

[13]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[14]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[15]  Kaushik Roy,et al.  Reducing set-associative cache energy via way-prediction and selective direct-mapping , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.