An integrated memory array processor architecture for embedded image recognition systems

Embedded processors for video image recognition require to address both the cost (die size and power) versus real-time performance issue, and also to achieve high flexibility due to the immense diversity of recognition targets, situations, and applications. This paper describes IMAP, a highly parallel SIMD linear processor and memory array architecture that addresses these trading-off requirements. By using parallel and systolic algorithmic techniques, despite of its simple architecture IMAP achieves to exploit not only the straightforward per image row data level parallelism (DLP), but also the inherent DLP of other memory access patterns frequently found in various image recognition tasks, under the use of an explicit parallel C language (IDC). We describe and evaluate IMAP-CE, a latest IMAP processor, which integrates 128 of 100MHz 8 bit 4-way VLIW PEs, 128 of 2KByte RAMs, and one 16 bit RISC control processor, into a single chip. The PE instruction set is enhanced for supporting IDC codes. IMAP-CE is evaluated mainly by comparing its performance running IDC codes with that of a 2.4GHz Intel P4 running optimized C codes. Based on the use of parallelizing techniques, benchmark results show a speedup of up to 20 for image filter kernels, and of 4 for a full image recognition application.

[1]  Christoforos E. Kozyrakis,et al.  Scalable Vector Processors for Embedded Systems , 2003, IEEE Micro.

[2]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[4]  A. Suga,et al.  A 51.2 GOPS 1.0 GB/s-DMA single-chip multi-processor integrating quadruple 8-way VLIW processors , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[5]  Shorin Kyo,et al.  AN EXTENDED C LANGUAGE AND A SIMD COMPILER FOR EFFICIENT IMPLEMENTATION OF IMAGE FILTERS ON MEDIA EXTENDED MICRO-PROCESSORS , 2003 .

[6]  Pieter P. Jonker Why linear arrays are better image processors , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 2 - Conference B: Computer Vision & Image Processing. (Cat. No.94CH3440-5).

[7]  Joseph JáJá,et al.  Efficient Image Processing Algorithms on the Scan Line Array Processor , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[8]  T. J. Fountain,et al.  The CLIP7A Image Processor , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  J. Hart,et al.  Implementation of a 4/sup th/-generation 1.8GHz dual-core SPARC V9 microprocessor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[10]  Luca Lombardi,et al.  Hierarchical architectures for computer vision , 1995, Proceedings Euromicro Workshop on Parallel and Distributed Processing.

[11]  Shorin Kyo,et al.  A robust vehicle detecting and tracking system for wet weather conditions using the IMAP-VISION image processing board , 1999, Proceedings 199 IEEE/IEEJ/JSAI International Conference on Intelligent Transportation Systems (Cat. No.99TH8383).

[12]  S. Pizer,et al.  The Image Processing Handbook , 1994 .

[13]  Daphna Weinshall,et al.  The MIT vision machine , 1988 .

[14]  Nader Bagherzadeh,et al.  Fast parallel FFT on a reconfigurable computation platform , 2003, Proceedings. 15th Symposium on Computer Architecture and High Performance Computing.

[15]  Shin'ichiro Okazaki,et al.  Imap: Integrated Memory Array Processor , 1992, J. Circuits Syst. Comput..

[16]  J. Geerlings,et al.  A single-chip MPEG2 CODEC for DVD+RW , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[17]  Yoshihiro Fujita,et al.  Integrated memory array processor , 1992 .

[18]  Jorge L. C. Sanz,et al.  SIMD architectures and algorithms for image processing and computer vision , 1989, IEEE Trans. Acoust. Speech Signal Process..

[19]  William J. Dally,et al.  Evaluating the Imagine stream architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[20]  Mary Jane Irwin,et al.  Image processing with the MGAP: a cost effective solution , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[21]  Dan Hammerstrom,et al.  Image processing using one-dimensional processor arrays , 1996, Proc. IEEE.

[22]  Y. Asada,et al.  An 8-way VLIW embedded multimedia processor built in 7-layer metal 0.11 /spl mu/m CMOS technology , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[23]  Y. Fujita,et al.  A parallelizing method for implementing image processing tasks on SIMD linear processor arrays , 1997, Proceedings Fourth IEEE International Workshop on Computer Architecture for Machine Perception. CAMP'97.

[24]  B. Flachs,et al.  A streaming processing unit for a CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[25]  Gunilla Borgefors,et al.  Distance transformations in digital images , 1986, Comput. Vis. Graph. Image Process..

[26]  Shorin Kyo,et al.  Efficient Implementation of Image Processing Algorithms on Linear Processor Arrays Using the Data Parallel Language IDC , 1996, MVA.

[27]  Jon A. Webb Steps toward architecture-independent image processing , 1992, Computer.

[28]  E. Alon,et al.  The implementation of a 2-core, multi-threaded itanium family processor , 2006, IEEE Journal of Solid-State Circuits.

[29]  I. Kuroda,et al.  A 51.2 GOPS scalable video recognition processor for intelligent cruise control based on a linear array of 128 4-way VLIW processing elements , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..