A low-cost mixed-mode parallel processor architecture for embedded systems

A scalable SIMD/MIMD mixed-mode parallel processor architecture called XC core is proposed to meet the high and diverse performance requirements of embedded multimedia applications. XC core supports both the SIMD and MIMD computing models at low hardware cost by dynamically reconfiguring itself into datapath circuits or control circuits, i.e., trading off between performance and flexibility. A control processor is used to broadcast instructions to a whole SIMD PE (Processing Element) array or to a part of it while assigning a separate program to each PU (Processing Unit), that is mainly composed of the hardware resources of several PEs. RTL synthesis results show that area overhead for reconfiguration is merely 10% of the total area. Benchmark results show that the SIMD mode is effectively achieving high performance towards the regular and massive data parallelism portions of applications, while the MIMD mode enables acceleration of the remaining part of applications whose implementation using a pure highly parallel SIMD architecture would otherwise be impossible. The results show that the XC core design is competitive against more complex processors, with respect to both its cost efficiency as a highly parallel SIMD processor and its flexibility as a multicore MIMD processor, against a wide range of applications.

[1]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[2]  Howard Jay Siegel,et al.  The PASM project: a study of reconfigurable parallel computing , 1996, Proceedings Second International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'96).

[3]  P. J. Narayanan Processor autonomy on SIMD architectures , 1993, ICS '93.

[4]  G. Jack Lipovski,et al.  SIMD and MIMD processing in the Texas Reconfigurable Array Computer , 1988, Proceedings COMPSAC 88: The Twelfth Annual International Computer Software & Applications Conference.

[5]  Manuel Lois Anido,et al.  Improving the operation autonomy of SIMD processing elements by using guarded instructions and pseudo branches , 2002, Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools.

[6]  Shorin Kyo,et al.  Efficient Implementation of Image Processing Algorithms on Linear Processor Arrays Using the Data Parallel Language IDC , 1996, MVA.

[7]  M. Maresca,et al.  Parallel architectures for vision , 1988 .

[8]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[9]  Shorin Kyo,et al.  An integrated memory array processor architecture for embedded image recognition systems , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[10]  Richard Hughey,et al.  Explicit SIMD programming for asynchronous applications , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[11]  Tarek A. El-Ghazawi,et al.  Single processor-pool MSIMD/MIMD architectures , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[12]  Shorin Kyo,et al.  A robust vehicle detecting and tracking system for wet weather conditions using the IMAP-VISION image processing board , 1999, Proceedings 199 IEEE/IEEJ/JSAI International Conference on Intelligent Transportation Systems (Cat. No.99TH8383).

[13]  Shorin Kyo,et al.  An Integrated Memory Array Processor for Embedded Image Recognition Systems , 2007, IEEE Transactions on Computers.

[14]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[15]  Sotirios G. Ziavras,et al.  Exploiting mixed-mode parallelism for matrix operations on the HERA architecture through reconfiguration , 2006 .

[16]  V. Strumpen,et al.  A 16-issue multiple-program-counter microprocessor with point-to-point scalar operand network , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[17]  Dennis M. Hawver,et al.  Processor autonomy and its effect on parallel program execution , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[18]  Shorin Kyo,et al.  An Integrated Memory Array Processor Architecture for Embedded Image Recognition Systems , 2005, ISCA 2005.