A Heterogeneous Cluster with Reconfigurable Accelerator for Energy Efficient Near-Sensor Data Analytics

IoT end-nodes require high performance and extreme energy efficiency to cope with complex near-sensor data analytics algorithms. Processing on multiple programmable processors operating in near-threshold is emerging as a promising solution to exploit the energy boost given by low-voltage operation, while recovering the related frequency degradation with parallelism. In this work, we present a heterogeneous cluster architecture extending a traditional parallel processor cluster with a reconfigurable Integrated Programmable Array (IPA) accelerator. While programmable processors guarantee programming legacy to easily manage peripherals, radio software stacks as well as the global program flow, offloading data-intensive and control-intensive kernels to the IPA leads to much higher system level performance and energy-efficiency. Experimental results show that the proposed heterogeneous cluster outperforms an 8-core homogeneous architecture by up to 4.8× in performance and 4.5× in energy efficiency when executing a mix of control-intensive and data-intensive kernels typical of near-sensor data analytics applications.

[1]  Luca Benini,et al.  Energy-Efficient Near-Threshold Parallel Computing: The PULPv2 Cluster , 2017, IEEE Micro.

[2]  Luca Benini,et al.  A 142MOPS/mW integrated programmable array accelerator for smart visual processing , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[3]  David Atienza,et al.  HEAL-WEAR: An Ultra-Low Power Heterogeneous System for Bio-Signal Analysis , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[4]  Luca Benini,et al.  Ultra-low-latency lightweight DMA for tightly coupled multi-core clusters , 2014, Conf. Computing Frontiers.

[5]  Luca Benini,et al.  Efficient mapping of CDFG onto coarse-grained reconfigurable array architectures , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[6]  Bjorn De Sutter,et al.  Coarse-Grained Reconfigurable Array Architectures , 2018, Handbook of Signal Processing Systems.

[7]  Michael Bedford Taylor,et al.  Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[8]  Luca Benini,et al.  A fully-synthesizable single-cycle interconnection network for Shared-L1 processor clusters , 2011, 2011 Design, Automation & Test in Europe.

[9]  Luca Benini,et al.  The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores , 2018, IEEE Transactions on Multi-Scale Computing Systems.