SIMD made explicit

Low energy consumption has become one of the most important topics in computing. With single CPUs consuming as much as 115 Watt, engineers have been looking for ways to reduce energy consumption while maintaining high computational performance. Often wide SIMD architectures are used to achieve this, exploiting data parallelism to keep the required clock frequency low for a given compute constraint. In this paper, we propose a wide SIMD architecture with explicit datapath to further optimize energy efficiency without sacrificing computation power. To have a detailed comparison, both the proposed wide SIMD architecture and its transparent bypassing counterpart are implemented in HDL and synthesized with a TSMC 40nm low power library. The power estimation is derived from actual toggle rates generated by post-synthesis simulation. Our experimental results show that with explicit bypassing the overall energy consumption can be reduced up to 44% compared to the corresponding SIMD architecture with transparent bypassing.

[1]  Yunsi Fei,et al.  Reducing power consumption of embedded processors through register file partitioning and compiler support , 2008, 2008 International Conference on Application-Specific Systems, Architectures and Processors.

[2]  R.P. Kleihorst,et al.  Xetal-II: A 107 GOPS, 600 mW Massively Parallel Processor for Video Scene Analysis , 2008, IEEE Journal of Solid-State Circuits.

[3]  Preeti Ranjan Panda,et al.  Power Reduction in VLIW Processor with Compiler Driven Bypass Network , 2007, 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID'07).

[4]  Yifan He,et al.  Scheduling for register file energy minimization in explicit datapath architectures , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Yifan He,et al.  From Xetal-II to Xetal-Pro: On the Road Toward an Ultralow-Energy and High-Throughput SIMD Processor , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Yifan He,et al.  OpenCL code generation for low energy wide SIMD architectures with explicit datapath , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[7]  Henk Corporaal,et al.  MOVE-Pro: A low power and high code density TTA architecture , 2011, 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[8]  Diederik Verkest,et al.  A Customized Cross-Bar for Data-Shuffling in Domain-Specific SIMD Processors , 2007, ARCS.

[9]  Scott A. Mahlke,et al.  AnySP: Anytime Anywhere Anyway Signal Processing , 2010, IEEE Micro.

[10]  Henk Corporaal,et al.  DC-SIMD : Dynamic communication for SIMD processors , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[11]  David Blaauw,et al.  A 1.07 Tbit/s 128×128 swizzle network for SIMD processors , 2010, 2010 Symposium on VLSI Circuits.

[12]  Paul Lukowicz,et al.  Architecture of Computing Systems - ARCS 2007, 20th International Conference, Zurich, Switzerland, March 12-15, 2007, Proceedings , 2007, ARCS.

[13]  Yifan He,et al.  Xetal-Pro: An ultra-low energy and high throughput SIMD processor , 2010, Design Automation Conference.