论文信息 - Fully Distributed On-chip Instruction Memory Design for Stream Architecture Based on Field-Divided VLIW Compression

Fully Distributed On-chip Instruction Memory Design for Stream Architecture Based on Field-Divided VLIW Compression

Huge code size and poor code density have always been a serious problem in VLIW processor. In order to deal with the problem and its influence on the instruction memory in stream architecture, this paper proposes a novel method called field-divided VLIW compression through analyzing the code characteristics of stream program across a wide range of typical stream application domains and dividing the instruction code unrelated to each other into different subfields. Based on the field-divided VLIW compression, this paper designs a fully distributed on-chip instruction memory (FDIM) for stream architecture. The experiment on MASA stream processor demonstrates that the field-divided VLIW compression can reduce about 38% of off-chip instruction code and about 66% of on-chip instruction memory space demand in the case of having little influence on the program performance; FDIM reduces the area of on-chip instruction memory by about 37%, thus reduces the area of the MASA stream processor by about 8.92%. Besides, the energy consumption of instruction memory is decreased by about 61%.

Yi He | Chunyuan Zhang | Tian Tian | Qianming Yang | Maolin Guan

[1] Zhang Licai,et al. Software Managed Instruction Scratchpad Memory Optimization in Stream Architecture Based on Hot Code Analysis of Kernels , 2010, 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools.

[2] Yuri V. Ivanov,et al. Dynamic complexity scaling for real-time H.264/AVC video encoding , 2007, ACM Multimedia.

[3] Nan Wu,et al. A Parallel Reed-Solomon Decoder on the Imagine Stream Processor , 2004, ISPA.

[4] Yang Qianming,et al. Software Managed Instruction Scratchpad Memory Optimization in Stream Architecture Based on Hot Code Analysis of Kernels , 2010, DSD 2010.

[5] Amir Roth,et al. A DISE implementation of dynamic code decompression , 2003, LCTES.

[6] Hai Lin,et al. Harnessing Horizontal Parallelism and Vertical Instruction Packing of Programs to Improve System Overall Efficiency , 2008, 2008 Design, Automation and Test in Europe.

[7] Peter Marwedel,et al. Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[8] Trevor Mudge,et al. MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[9] Jörg Henkel,et al. LICT: Left-uncompressed Instructions Compression Technique to improve the decoding performance of VLIW processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[10] Ying Zhang,et al. Fei Teng 64 Stream Processing System: Architecture, Compiler, and Programming , 2009, IEEE Transactions on Parallel and Distributed Systems.

[11] William J. Dally,et al. Imagine: Media Processing with Streams , 2001, IEEE Micro.

[12] William J. Dally,et al. The VLSI implementation and evaluation of area-and energy-efficient streaming media processors , 2003 .

[13] David Black-Schaffer,et al. An Energy-Efficient Processor Architecture for Embedded Systems , 2008, IEEE Computer Architecture Letters.

[14] Tulika Mitra,et al. A DVS-based pipelined reconfigurable instruction memory , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[15] William J. Dally,et al. Evaluating the Imagine stream architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[16] William J. Dally,et al. Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[17] Jiangjiang Liu,et al. Analysis and Characterization of Intel Itanium Instruction Bundles for Improving VLIW Processor Performance , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[18] Jung Ho Ahn,et al. Memory and control organizations of stream processors , 2007 .

[19] Nan Wu,et al. Multiple-Morphs Adaptive Stream Architecture , 2005, Journal of Computer Science and Technology.

[20] Brad Calder,et al. Reducing code size with echo instructions , 2003, CASES '03.

[21] David Black-Schaffer,et al. Hierarchical Instruction Register Organization , 2008, IEEE Computer Architecture Letters.