Hierarchical Instruction Register Organization

This paper analyzes a range of architectures for efficient delivery of VLIW instructions for embedded media kernels. The analysis takes an efficient filter cache as a baseline and examines the benefits from 1) removing the tag overhead, 2) distributing the storage, 3) adding indirection, 4) adding efficient NOP generation, and 5) sharing instruction memory. The result is a hierarchical instruction register organization that provides a 56% energy and 40% area savings over an already efficient filter cache.

[1]  Michael J. Flynn,et al.  Microprogramming revisited , 1967, ACM '67.

[2]  Maurice V. Wilkes,et al.  The best way to design an automatic calculating machine , 1981 .

[3]  Onat Menzilcioglu A case study in using two-level control stores , 1987, MICRO 20.

[4]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[5]  John Arends,et al.  Instruction fetch energy reduction using loop caches for embedded applications with small tight loops , 1999, ISLPED '99.

[6]  Hans Mulder,et al.  Introducing the IA-64 Architecture , 2000, IEEE Micro.

[7]  Simon Segars Low power design techniques for microprocessors , 2000 .

[8]  A design space evaluation of grid processor architectures , 2001, MICRO.

[9]  Frank Vahid,et al.  Tiny instruction caches for low power embedded systems , 2003, TECS.

[10]  Gary S. Tyson,et al.  Reducing instruction fetch cost by packing instructions into register windows , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[11]  Henk Corporaal,et al.  Clustered loop buffer organization for low energy VLIW embedded processors , 2005, IEEE Transactions on Computers.

[12]  Jiangjiang Liu,et al.  Analysis and Characterization of Intel Itanium Instruction Bundles for Improving VLIW Processor Performance , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[13]  Simha Sethumadhavan,et al.  Distributed Microarchitectural Protocols in the TRIPS Prototype Processor , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).