MaPU: A novel mathematical computing architecture

As the feature size of the semiconductor process is scaling down to 10nm and below, it is possible to assemble systems with high performance processors that can theoretically provide computational power of up to tens of PLOPS. However, the power consumption of these systems is also rocketing up to tens of millions watts, and the actual performance is only around 60% of the theoretical performance. Today, power efficiency and sustained performance have become the main foci of processor designers. Traditional computing architecture such as superscalar and GPGPU are proven to be power inefficient, and there is a big gap between the actual and peak performance. In this paper, we present the MaPU architecture, a novel architecture which is suitable for data-intensive computing with great power efficiency and sustained computation throughput. To achieve this goal, MaPU attempts to optimize the application from a system perspective, including the hardware, algorithm and corresponding program model. It uses an innovative multi-granularity parallel memory system with intrinsic shuffle ability, cascading pipelines with wide SIMD data paths and a state-machine-based program model. When executing typical signal processing algorithms, a single MaPU core implemented with a 40nm process exhibits a sustained performance of 134 GLOPS while consuming only 2.8 W in power, which increases the actual power efficiency by an order of magnitude comparable with the traditional CPU and GPGPU.

[1]  Victor V. Zyuban,et al.  Inherently Lower-Power High-Performance Superscalar Architectures , 2001, IEEE Trans. Computers.

[2]  Manoj Sachdev,et al.  A low-power reduced swing global clocking methodology , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Michael Bedford Taylor,et al.  A Landscape of the New Dark Silicon Design Regime , 2013, IEEE Micro.

[4]  David Parello,et al.  On Increasing Architecture Awareness in Program Optimizations to Bridge the Gap between Peak and Sustained Processor Performance — Matrix-Multiply Revisited , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[5]  Yong Liu,et al.  A 45nm CMOS neuromorphic chip with a scalable architecture for learning in networks of spiking neurons , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[6]  Bevan M. Baas,et al.  A low-power, high-performance, 1024-point FFT processor , 1999, IEEE J. Solid State Circuits.

[7]  Karthikeyan Sankaralingam,et al.  Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[8]  David Gregg,et al.  The Movidius Myriad Architecture's Potential for Scientific Computing , 2015, IEEE Micro.

[9]  Henk Corporaal,et al.  MOVE-Pro: A low power and high code density TTA architecture , 2011, 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[10]  Margaret Martonosi,et al.  Power-Efficient Computer Architectures: Recent Advances , 2014, Power-Efficient Computer Architectures: Recent Advances.

[11]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[12]  Margaret Martonosi,et al.  Computer Architecture Techniques for Power-Efficiency , 2008, Computer Architecture Techniques for Power-Efficiency.

[13]  Yoav Etsion,et al.  Single-graph multiple flows: Energy efficient design alternative for GPGPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[14]  Michel Laurence Introduction to Octasic Asynchronous Processor Technology , 2012, 2012 IEEE 18th International Symposium on Asynchronous Circuits and Systems.

[15]  Vikram Bhatt,et al.  The GreenDroid Mobile Application Processor: An Architecture for Silicon's Dark Future , 2011, IEEE Micro.

[16]  Nam Sung Kim,et al.  Power-efficient computing for compute-intensive GPGPU applications , 2012, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[17]  David Moloney,et al.  Always-on Vision Processing Unit for Mobile Applications , 2015, IEEE Micro.

[18]  Scott A. Mahlke,et al.  A Customized Processor for Energy Efficient Scientific Computing , 2012, IEEE Transactions on Computers.

[19]  Willie Anderson,et al.  Hexagon DSP: An Architecture Optimized for Mobile Multimedia and Communications , 2014, IEEE Micro.