A VLIW Vector Media Coprocessor With Cascaded SIMD ALUs

High-definition video applications, such as digital TV and digital video cameras, require high processing performance for high-quality visual images in addition to a complex video CODEC. Pre-/postprocessing to improve video quality is becoming much more important because requirements for pre-/postprocessing vary among applications and processing algorithms have not been stabilized. Therefore, a new processor architecture that has a highly parallel datapath is needed. In this paper, we introduce a VLIW vector media coprocessor, ldquovector coprocessor (VCP),rdquo that includes three asymmetric execution pipelines with cascaded SIMD ALUs. To improve performance efficiency, we reduce the area ratio of the control circuit while increasing the ratio of the arithmetic circuit. The total gate count of VCP is 1268 kgates and its maximum operating frequency is 300 MHz at 90-nm CMOS process. Some of the processing kernels in an adaptive prefilter that is applied to preprocessing for video encoding are evaluated. In the case of the edgeness and the sum of absolute differences, the performance is 183 giga operations per second. VCP offers enough performance for HD video processing and good cost-performance while all processing pipeline units operate effectively.

[1]  Kunle Olukotun,et al.  REMARC : Reconfigurable Multimedia Array Coprocessor , 1999 .

[2]  Christopher Batten,et al.  The vector-thread architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[3]  Shorin Kyo,et al.  An integrated memory array processor architecture for embedded image recognition systems , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[4]  T. Miyamori A Configurable and Extensible Media Processor , 2002 .

[5]  William J. Dally,et al.  A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing , 2007, IEEE Journal of Solid-State Circuits.

[6]  Joseph A. Fisher,et al.  Very long instruction work architectures and the ELI-512 , 1983, ISCA '98.

[7]  Avideh Zakhor,et al.  Performance analysis of an H.263 video encoder for VIRAM , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[8]  K. Maeda,et al.  Visconti: multi-VLIW image recognition processor based on configurable processor [obstacle detection applications] , 2003, Proceedings of the IEEE 2003 Custom Integrated Circuits Conference, 2003..

[9]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[10]  Hyunseok Lee,et al.  SODA: A Low-power Architecture For Software Radio , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).