Memory-centric video processing

This work presents a domain-specific memory subsystem based on a two-level memory hierarchy. It targets the application domain of video post-processing applications including video enhancement and format conversion. These applications are based on motion compensation and/or broad class of content adaptive filtering to provide the highest quality of pictures. Our approach meets the required performance and has sufficient flexibility for the application domain. It especially aims at the implementation-wise most challenging applications: compute-intensive and bandwidth-demanding applications that provide the highest quality at high picture resolutions. The lowest level of the memory hierarchy, closest to the processing element, the L0 scratchpad, is organized specifically to enable fast retrieval of an arbitrarily positioned 2-D block of pixels to the processing element. To guarantee the performance, most of its addressing logic is hardwired, leaving a user a set of API for initialization and storing/loading the data to/from the L0 scratchpad. The next level of the memory hierarchy, the L1 scratchpad, minimizes the off-chip memory bandwidth requirements. The L1 scratchpad is organized specifically to enable efficient aligned block-based accesses. With lower data rates compared to the L0 scratchpad and aligned block access, software-based addressing is used to enable full flexibility. The two-level memory hierarchy exploits prefetching to further improve the performance.

[1]  Paul Budnik,et al.  The Organization and Use of Parallel Memories , 1971, IEEE Transactions on Computers.

[2]  Chein-Wei Jen,et al.  On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture , 2002, IEEE Trans. Circuits Syst. Video Technol..

[3]  K. Ronner,et al.  A 1.3 GOPS parallel DSP for high performance image processing applications , 2000, Proceedings of the 25th European Solid-State Circuits Conference.

[4]  Gerard de Haan,et al.  An Efficient Picture-Rate Up-Converter , 2005, J. VLSI Signal Process..

[5]  Peter Pirsch,et al.  HiBRID-SoC: a multi-core system-on-chip architecture for multimedia signal processing applications , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[6]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[7]  Reiner Creutzburg,et al.  On Design of Parallel Memory Access Schemes for Video Coding , 2005, J. VLSI Signal Process..

[8]  Gerard de Haan,et al.  Application specific instruction-set processor template for motion estimation in video applications , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Peter Pirsch,et al.  A programmable co-porcessor for MPEG-4 video , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Gerard de Haan,et al.  Algorithm/architecture co-design of the generalized sampling theorem based de-interlacer [video signal processing] , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[11]  Markus Schu,et al.  System on silicon-IC for motion compensated scan rate conversion, picture-in-picture processing, split screen applications and display processing , 1999, 1999 Digest of Technical Papers. International Conference on Consumer Electronics (Cat. No.99CH36277).

[12]  Jong Won Park An Efficient Buffer Memory System for Subarray Access , 2001, IEEE Trans. Parallel Distributed Syst..

[13]  G. de Haan,et al.  Tackling occlusion in scan rate conversion systems , 2003, 2003 IEEE International Conference on Consumer Electronics, 2003. ICCE..

[14]  Gauthier Lafruit,et al.  Memory centric design of an MPEG-4 video encoder , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Duncan H. Lawrie,et al.  Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.

[16]  A. Beric,et al.  Low-bandwidth dynamic aspect ratio region-based motion estimation , 2005, IEEE Workshop on Signal Processing Systems Design and Implementation, 2005..

[17]  G. Slavenburg,et al.  A single-chip hybrid media processor for CRT and matrix displays-based televisions , 2003, 2003 IEEE International Conference on Consumer Electronics, 2003. ICCE..

[18]  Gerard de Haan,et al.  Streaming scratchpad memory organization for video applications , 2004, Circuits, Signals, and Systems.

[19]  Gerard de Haan,et al.  An evolutionary architecture for motion-compensated 100 Hz television , 1995, IEEE Trans. Circuits Syst. Video Technol..