Synchronous Communication-Based Many-Core SoC

The major trend in embedded SoC (system-on-chip) design goes to fabricating chips with multiple cores operating at lower frequencies MPSoCs (multiprocessor systems on single chips) to satisfy the ever-increasing computing demands, supported by the consistently increasing number of transistors in SoCs. The number of integrated cores is more and more increasing reaching hundreds in a single chip. Such systems are called many-core embedded systems. They integrate a pool of parallel processing elements and can be added as programmable accelerators to the CPU leading to a higher performance with a low power compared to GPUs (graphical processing units). In fact, GPUs are designed for a power budget orders of magnitude higher than CPUs, making them not suitable for the embedded applications field. This paper presents a parametric many-core embedded system that can handle synchronous regular communications. The proposed high-performance architecture can run in asynchronous computing mode while assuring synchronous communications to enhance the efficiency of the routing mechanism and achieve the best possible speed-up for a wide range of data-parallel applications. This paper presents an evaluation of main used parallel execution modes and demonstrates the performance of the proposed computation model compared to others in terms of execution time. The proposed FPGA-based many-core hardware architecture performance is evaluated by various data-parallel synthetic and application benchmarks including odd-even sorting and 2D image edge detection. Experimental results show that the proposed execution mode scheduling for 2D image filtering algorithm can result in speed-up of about 20% compared to a pure asynchronous implementation.

[1]  Sunitha Lasrado,et al.  Performance Analysis of Sobel Edge Filter on Heterogeneous System Using OpenCL , 2014 .

[2]  Alexander Knapp,et al.  On the Correctness of the SIMT Execution Model of GPUs , 2012, ESOP.

[3]  Wojciech Wójcik,et al.  Fpga-Based Multi-Core Processor , 2013, Comput. Sci..

[4]  Sotirios G. Ziavras,et al.  H-SIMD Machine: Configurable Parallel Computing for Matrix Multiplication , 2005, ICCD.

[5]  Tyson S. Hall,et al.  Rapid Prototyping of Digital Systems: SOPC Edition , 2007 .

[6]  Luca Benini,et al.  P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7]  Viktor K. Prasanna,et al.  Multi-Core Architecture on FPGA for Large Dictionary String Matching , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[8]  Roger F. Woods,et al.  IPPro: FPGA based image processing processor , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[9]  Ioannis Papaefstathiou,et al.  MPLEM: An 80-processor FPGA Based Multiprocessor System , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[10]  Lars Wienbrandt,et al.  Massively parallel FPGA-based implementation of BLASTp with the two-hit method , 2011, ICCS.

[11]  Tinoosh Mohsenin,et al.  Low energy sketching engines on many-core platform for big data acceleration , 2016, 2016 International Great Lakes Symposium on VLSI (GLSVLSI).

[12]  Victor M. Brea,et al.  SIMD/MIMD Dynamically-Reconfigurable Architecture for High-Performance Embedded Vision Systems , 2012, 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors.

[13]  Mohamed Abid,et al.  Scalable mpNoC for massively parallel systems - Design and implementation on FPGA , 2010, J. Syst. Archit..

[14]  Yunhao Liu,et al.  Sea Depth Measurement with Restricted Floating Sensors , 2007, 28th IEEE International Real-Time Systems Symposium (RTSS 2007).

[15]  John Wawrzynek,et al.  MARC: A Many-Core Approach to Reconfigurable Computing , 2010, 2010 International Conference on Reconfigurable Computing and FPGAs.

[16]  Frank Vahid,et al.  Automatic synthesis of physical system differential equation models to a custom network of general processing elements on FPGAs , 2013, TECS.

[17]  Yoshiki Yamaguchi,et al.  A study of an FPGA based flexible SIMD processor , 2011, CARN.

[18]  Anil Vohra,et al.  Comprehensive Review and Comparative Analysis of Hardware Architectures for Sobel Edge Detector , 2014 .

[19]  Bin Liu,et al.  A 5.8 pJ/Op 115 billion ops/sec, to 1.78 trillion ops/sec 32nm 1000-processor array , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[20]  Sotirios G. Ziavras,et al.  Exploiting mixed-mode parallelism for matrix operations on the HERA architecture through reconfiguration , 2006 .

[21]  Houman Homayoun,et al.  Energy-efficient mapping of biomedical applications on domain-specific accelerator under process variation , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[22]  Johnnie W. Baker,et al.  Importance of SIMD computation reconsidered , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[23]  Mohamed Abid,et al.  IP Based Configurable SIMD Massively Parallel SoC , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[24]  Jürgen Becker,et al.  An FPGA-based multi-core approach for pipelining computing stages , 2013, SAC '13.

[25]  Fabrice Lemonnier,et al.  Definition and SIMD Implementation of a Multi-Processing Architecture Approach on FPGA , 2008, 2008 Design, Automation and Test in Europe.

[26]  Srinivas Devadas,et al.  Heracles: Fully Synthesizable Parameterized MIPS-Based Multicore System , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[27]  Kees G. W. Goossens,et al.  Trade-offs in the Configuration of a Network on Chip for Multiple Use-Cases , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[28]  Steve Rehfuss,et al.  Comparing SFMD and SPMD Computation for On-Chip Multiprocessing of Intermediate Level Image Understanding Algorithms , 1997 .

[29]  Lee Wang,et al.  Data Parallel Algorithms , 1994 .

[30]  Luca Benini,et al.  Platform 2012, a many-core computing accelerator for embedded SoCs: Performance evaluation of visual analytics applications , 2012, DAC Design Automation Conference 2012.

[31]  Frederica Darema,et al.  SPMD Computational Model , 2011, Encyclopedia of Parallel Computing.

[32]  Luca Benini,et al.  He-P2012: Architectural heterogeneity exploration on a scalable many-core platform , 2014, ASAP.

[33]  Mohamed Abid,et al.  Master-Slave Control Structure for Massively Parallel System on Chip , 2013, 2013 Euromicro Conference on Digital System Design.

[34]  Benoît Dupont de Dinechin,et al.  A Distributed Run-Time Environment for the Kalray MPPA®-256 Integrated Manycore Processor , 2013, ICCS.

[35]  Frank Hannig,et al.  Invasive Tightly-Coupled Processor Arrays , 2014, ACM Trans. Embed. Comput. Syst..

[36]  Liang-Gee Chen,et al.  A 1.0TOPS/W 36-core neocortical computing processor with 2.3Tb/s Kautz NoC for universal visual recognition , 2012, 2012 IEEE International Solid-State Circuits Conference.

[37]  Fabien Clermidy,et al.  A fully-asynchronous low-power framework for GALS NoC integration , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).