Fully Distributed On-chip Instruction Memory Design for Stream Architecture Based on Field-Divided VLIW Compression

Huge code size and poor code density have always been a serious problem in VLIW processor. In order to deal with the problem and its influence on the instruction memory in stream architecture, this paper proposes a novel method called field-divided VLIW compression through analyzing the code characteristics of stream program across a wide range of typical stream application domains and dividing the instruction code unrelated to each other into different subfields. Based on the field-divided VLIW compression, this paper designs a fully distributed on-chip instruction memory (FDIM) for stream architecture. The experiment on MASA stream processor demonstrates that the field-divided VLIW compression can reduce about 38% of off-chip instruction code and about 66% of on-chip instruction memory space demand in the case of having little influence on the program performance; FDIM reduces the area of on-chip instruction memory by about 37%, thus reduces the area of the MASA stream processor by about 8.92%. Besides, the energy consumption of instruction memory is decreased by about 61%.

[1]  Zhang Licai,et al.  Software Managed Instruction Scratchpad Memory Optimization in Stream Architecture Based on Hot Code Analysis of Kernels , 2010, 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools.

[2]  Yuri V. Ivanov,et al.  Dynamic complexity scaling for real-time H.264/AVC video encoding , 2007, ACM Multimedia.

[3]  Nan Wu,et al.  A Parallel Reed-Solomon Decoder on the Imagine Stream Processor , 2004, ISPA.

[4]  Yang Qianming,et al.  Software Managed Instruction Scratchpad Memory Optimization in Stream Architecture Based on Hot Code Analysis of Kernels , 2010, DSD 2010.

[5]  Amir Roth,et al.  A DISE implementation of dynamic code decompression , 2003, LCTES.

[6]  Hai Lin,et al.  Harnessing Horizontal Parallelism and Vertical Instruction Packing of Programs to Improve System Overall Efficiency , 2008, 2008 Design, Automation and Test in Europe.

[7]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[8]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[9]  Jörg Henkel,et al.  LICT: Left-uncompressed Instructions Compression Technique to improve the decoding performance of VLIW processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[10]  Ying Zhang,et al.  Fei Teng 64 Stream Processing System: Architecture, Compiler, and Programming , 2009, IEEE Transactions on Parallel and Distributed Systems.

[11]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[12]  William J. Dally,et al.  The VLSI implementation and evaluation of area-and energy-efficient streaming media processors , 2003 .

[13]  David Black-Schaffer,et al.  An Energy-Efficient Processor Architecture for Embedded Systems , 2008, IEEE Computer Architecture Letters.

[14]  Tulika Mitra,et al.  A DVS-based pipelined reconfigurable instruction memory , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[15]  William J. Dally,et al.  Evaluating the Imagine stream architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[16]  William J. Dally,et al.  Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[17]  Jiangjiang Liu,et al.  Analysis and Characterization of Intel Itanium Instruction Bundles for Improving VLIW Processor Performance , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[18]  Jung Ho Ahn,et al.  Memory and control organizations of stream processors , 2007 .

[19]  Nan Wu,et al.  Multiple-Morphs Adaptive Stream Architecture , 2005, Journal of Computer Science and Technology.

[20]  Brad Calder,et al.  Reducing code size with echo instructions , 2003, CASES '03.

[21]  David Black-Schaffer,et al.  Hierarchical Instruction Register Organization , 2008, IEEE Computer Architecture Letters.