A scalable massively parallel processor for real-time image processing

This paper describes a high performance scalable massively parallel single-instruction multiple-data (SIMD) processor and power/area efficient real-time image processing. The SIMD processor combines 4-bit processing elements (PEs) with SRAM on a small area and thus enables at the same time a high performance of 191 GOPS, a high power efficiency of 310 GOPS/W, and a high area efficiency of 31.6 GOPS/mm2 . The applied pipeline architecture is optimized to reduce the number of controller overhead cycles so that the SIMD parallel processing unit can be utilized during up to 99% of the operating time of typical application programs. The processor can be also optimized for low cost, low power, and high performance multimedia system-on-a-chip (SoC) solutions. A combination of custom and automated implementation techniques enables scalability in the number of PEs. The processor has two operating modes, a normal frequency (NF) mode for higher power efficiency and a double frequency (DF) mode for higher performance. The combination of high area efficiency, high power efficiency, high performance, and the flexibility of the SIMD processor described in this paper expands the application of real-time image processing technology to a variety of electronic devices.

[1]  T. Gyohten,et al.  The Design and Implementation of the Massively Parallel Processor Based on the Matrix Architecture , 2007, IEEE Journal of Solid-State Circuits.

[2]  Hoi-Jun Yoo,et al.  A 201.4 GOPS 496 mW Real-Time Multi-Object Recognition Processor With Bio-Inspired Neural Perception Engine , 2009, IEEE Journal of Solid-State Circuits.

[3]  Chih-Chi Cheng,et al.  iVisual: An Intelligent Visual Sensor SoC With 2790 fps CMOS Image Sensor and 205 GOPS/W Vision Processor , 2009, IEEE Journal of Solid-State Circuits.

[4]  Kazutami Arimoto,et al.  Integral-image based implementation of U-SURF algorithm for embedded super parallel processor , 2009, 2009 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS).

[5]  R.P. Kleihorst,et al.  Xetal-II: A 107 GOPS, 600 mW Massively Parallel Processor for Video Scene Analysis , 2008, IEEE Journal of Solid-State Circuits.

[6]  I. Kuroda,et al.  A 51.2 GOPS scalable video recognition processor for intelligent cruise control based on a linear array of 128 4-way VLIW processing elements , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[7]  Hans Jürgen Mattausch,et al.  A Scalable Massively Parallel Processor for Real-Time Image Processing , 2011, IEEE J. Solid State Circuits.