Design and Implementation of a Configurable Heterogeneous Multicore SoC With Nine CPUs and Two Matrix Processors

A multicore system-on-chip (SoC) has been developed for various applications (recognition, inference, measurement, control, and security) that require high-performance processing and low power consumption. This SoC integrates three types of synthesizable processors: eight CPUs (M32R), two multi-bank matrix processors (MBMX), and a controller (M32C). These processors operate at 1 GHz, 500 MHz, and 500 MHz, respectively. These three types of processors are interconnected on this chip with a high-bandwidth multi-layer system bus. The eight CPUs are connected to a common pipelined bus using a cache coherence mechanism. Additionally, a 512-kB L2 cache memory is shared by the eight CPUs to reduce internal bus traffic. A multi-bank matrix processor with 2-read/1-write calculation and background I/O operation has been adopted. The 1-GHz CPU is realized using a delay management network which consists of delay monitors that can be applied for any kind of application or process technology. Our configurable heterogeneous architecture with nine CPUs and two matrix processors reduces power consumption by 45%.

[1]  Luca Benini,et al.  Clock Skew Optimization for Peak Current Reduction , 1996, Proceedings of 1996 International Symposium on Low Power Electronics and Design.

[2]  T. Gyohten,et al.  The Circuits and Robust Design Methodology of the Massively Parallel Processor Based on the Matrix Architecture , 2006, IEEE Journal of Solid-State Circuits.

[3]  T. Gyohten,et al.  The Design and Implementation of the Massively Parallel Processor Based on the Matrix Architecture , 2007, IEEE Journal of Solid-State Circuits.

[4]  S. Tam,et al.  A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[5]  K. Dosaka,et al.  A 40GOPS 250mW massively parallel processor based on matrix architecture , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[6]  Jinuk Luke Shin,et al.  A Power-Efficient High-Throughput 32-Thread SPARC Processor , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[7]  S. Asano,et al.  The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[8]  Miroslaw Bober,et al.  Implementation of Face Recognition Processing Using an Embedded Processor , 2005, J. Robotics Mechatronics.

[9]  M. Nakajima,et al.  A 600-MHz single-chip multiprocessor with 4.8-GB/s internal shared pipelined bus and 512-kB internal memory , 2004, IEEE Journal of Solid-State Circuits.

[10]  Alessandro Bogliolo,et al.  Clock skew optimization for peak current reduction , 1996 .

[11]  N. Okumura,et al.  Design of a Multi-Core SoC with Configurable Heterogeneous 9 CPUs and 2 Matrix Processors , 2007, 2007 IEEE Symposium on VLSI Circuits.

[12]  T. Aoki,et al.  3D face recognition using passive stereo vision , 2005, IEEE International Conference on Image Processing 2005.