Configurable Data Memory for Multimedia Processing

In modern multimedia applications, memory bottleneck can be alleviated with special stride data accesses. Data elements in stride access can be retrieved in parallel with parallel memories, in which the idea is to increase memory bandwidth with several memory modules working in parallel and feed the processor with only necessary data. Arbitrary stride access capability with interleaved memories is described in previous research where the skewing scheme is changed at run time according to the currently used stride. This paper presents the improved schemes which are adapted to parallel memories. The proposed novel parallel memory implementation allows conflict free accesses with all the constant strides which has not been possible in prior application specific parallel memories. Moreover, the possible access locations are unrestricted and the accessed data element count equals to the number of memory modules. Timing and area estimates are given for Altera Stratix FPGA and 0.18 micrometer CMOS process with memory module count from 2 to 32. The FPGA results show 129 MHz clock frequency for a system with 16 memory modules when read and write latencies are 3 and 2 clock cycles, respectively. The complexity of the proposed system is shown to be a trade-off between application specific and highly configurable parallel memory system.

[1]  Eduard Ayguadé,et al.  Conflict-Free Access for Streams in Multimodule Memories , 1995, IEEE Trans. Computers.

[2]  Reiner Creutzburg,et al.  On Design of Parallel Memory Access Schemes for Video Coding , 2005, J. VLSI Signal Process..

[3]  Timo Hämäläinen,et al.  A Parallel Memory System for Variable Block-Size Motion Estimation Algorithms , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Christoforos E. Kozyrakis,et al.  Scalable Vector Processors for Embedded Systems , 2003, IEEE Micro.

[5]  Paul Budnik,et al.  The Organization and Use of Parallel Memories , 1971, IEEE Transactions on Computers.

[6]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[7]  Z. Greenfield,et al.  The TigerSHARC DSP Architecture , 2000, IEEE Micro.

[8]  Alan Jay Smith,et al.  Measuring the Performance of Multimedia Instruction Sets , 2002, IEEE Trans. Computers.

[9]  André Seznec,et al.  Interleaved Parallel Schemes , 1994, IEEE Trans. Parallel Distributed Syst..

[10]  David T. Harper,et al.  Conflict-Free Vector Access Using a Dynamic Storage Scheme , 1991, IEEE Trans. Computers.

[11]  Lech Józwiak,et al.  Synthesis of XOR storage schemes with different cost for minimization of memory contention , 1999, Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium.

[12]  Alan Norton,et al.  A Class of Boolean Linear Transformations for Conflict-Free Power-of-Two Stride Access , 1987, ICPP.

[13]  Stamatis Vassiliadis,et al.  The TM3270 media-processor , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[14]  De-Lei Lee On Access and Alignment of Data in a Parallel Processor , 1989, Inf. Process. Lett..

[15]  Shreekant S. Thakkar,et al.  Internet Streaming SIMD Extensions , 1999, Computer.

[16]  Stamatis Vassiliadis,et al.  The CSI multimedia architecture , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[17]  Mateo Valero,et al.  Exploiting a new level of DLP in multimedia applications , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[18]  David T. Harper,et al.  Increased Memory Performance During Vector Accesses Through the use of Linear Address Transformations , 1992, IEEE Trans. Computers.

[19]  Norman P. Jouppi,et al.  Performance of image and video processing with general-purpose processors and media ISA extensions , 1999, ISCA.

[20]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[21]  Jarmo Takala,et al.  Systematic approach for path metric access in Viterbi decoders , 2005, IEEE Transactions on Communications.

[22]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[23]  Jarkko Niittylahti,et al.  Byte and modulo addressable parallel memory architecture for video coding , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Peter Pirsch,et al.  HiBRID-SoC: A Multi-Core SoC Architecture for Multimedia Signal Processing , 2005, J. VLSI Signal Process..

[25]  Ville Lappalainen,et al.  Overview of research efforts on media ISA extensions and their usage in video coding , 2002, IEEE Trans. Circuits Syst. Video Technol..

[26]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[27]  Lizy Kurian John,et al.  Evaluating signal processing and multimedia applications on SIMD, VLIW and superscalar architectures , 2000, Proceedings 2000 International Conference on Computer Design.

[28]  Hunter Scales,et al.  AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.

[29]  Jarno Vanne,et al.  Block-level parallel processing for scaling evenly divisible images , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[30]  Marc Tremblay,et al.  VIS speeds new media processing , 1996, IEEE Micro.

[31]  Timo Hämäläinen,et al.  Parallel Memory Architecture for Arbitrary Stride Accesses , 2006, 2006 IEEE Design and Diagnostics of Electronic Circuits and systems.

[32]  Timo Hämäläinen,et al.  Parallel Memory Implementation for Arbitrary Stride Accesses , 2006, 2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[33]  Jong Won Park An Efficient Memory System for Image Processing , 1986, IEEE Trans. Computers.

[34]  Timo Hämäläinen,et al.  Address Computation in Configurable Parallel Memory Architecture , 2004, IEICE Trans. Inf. Syst..

[35]  E. Boutillon,et al.  Access and alignment of arrays for a bidimensional parallel memory , 1994, Proceedings of IEEE International Conference on Application Specific Array Processors (ASSAP'94).

[36]  David A. Carlson,et al.  Multimedia extensions for a 550-MHz RISC microprocessor , 1997 .

[37]  Rajendra S. Katti,et al.  Nonprime Memory Systems and Error Correction in Address Translation , 1997, IEEE Trans. Computers.

[38]  Duncan H. Lawrie,et al.  Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.

[39]  Jarmo Takala,et al.  Stride Permutation Access In Interleaved Memory Systems , 2003 .

[40]  Ira Krepchin,et al.  Texas Instruments Inc. , 1963, Nature.

[41]  Chris Basoglu,et al.  The Equator MAP-CA/spl trade/ DSP: an end-to-end broadband signal processor/spl trade/ VLIW , 2002, IEEE Trans. Circuits Syst. Video Technol..

[42]  William Jalby,et al.  XOR-Schemes: A Flexible Data Organization in Parallel Memories , 1985, ICPP.

[43]  Zhen Fang,et al.  The Impulse Memory Controller , 2001, IEEE Trans. Computers.

[44]  T. Ikenaga,et al.  An efficient deblocking filter architecture with 2-dimensional parallel memory for H.264/AVC , 2005, Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005..

[45]  Fred Weber,et al.  AMD 3DNow! technology: architecture and implementations , 1999, IEEE Micro.

[46]  Sally A. McKee,et al.  Dynamic Access Ordering for Streamed Computations , 2000, IEEE Trans. Computers.

[47]  Jan van Leeuwen,et al.  On Linear Skewing Schemes and d-Ordered Vectors , 1987, IEEE Transactions on Computers.

[48]  Timo Hämäläinen,et al.  Configurable implementation of parallel memory based real-time video downscaler , 2007, Microprocess. Microsystems.

[49]  Viktor K. Prasanna,et al.  Parallel memory systems for image processing , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Timo Hämäläinen,et al.  HIBI-based multiprocessor SoC on FPGA , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[51]  Andrew Wolfe,et al.  A methodology to evaluate memory architecture design tradeoffs for video signal processors , 1998, IEEE Trans. Circuits Syst. Video Technol..

[52]  Paolo Faraboschi,et al.  The latest word in digital and media processing , 1998 .

[53]  Peter Pirsch,et al.  Architecture Concepts for Multimedia Signal Processing , 2001, J. VLSI Signal Process..

[54]  Jarkko Niittylahti,et al.  Scalable Parallel Memory Architectures for Video Coding , 2004, J. VLSI Signal Process..

[55]  Stamatis Vassiliadis,et al.  Multimedia rectangularly addressable memory , 2006, IEEE Transactions on Multimedia.