Function-Level Processor (FLP): Raising efficiency by operating at function granularity for market-oriented MPSoC

The exponential growth in computation demand drives chip vendors to heterogeneous architectures combining Instruction-Level Processors (ILPs) and custom HW Accelerators (HWACCs) in an attempt to provide the needed processing capabilities while meeting power/energy requirements. ILPs, on one hand, are highly flexible, but power inefficient. Custom HWACCs, on the other hand, are inflexible (focusing on dedicated kernels), but highly power efficient. Since, designing HWACCs for every application is cost prohibitive, large portions of applications still run inefficiently on ILPs. New processing architectures are needed that combine the power efficiency of HWACCs while still retaining sufficient flexibility to realize applications across targeted market segments. This paper introduces Function-Level Processors (FLPs) to fill the gap between ILPs and dedicated HWACCs. FLPs are comprised of configurable Function Blocks (FBs) implementing selected functions which are then interconnected via programmable point-to-point connections constructing an extensible/configurable macro data-path. An FLP raises programming abstraction to a Function-Set Architecture (FSA) controlling FBs allocation, configuration and scheduling. We demonstrate FLP benefits with an industry example of the Pipeline-Vision Processor (PVP). We highlight the gained flexibility by mapping 10 embedded vision applications entirely to the FLP-PVP offering up to 22.4 GOPs/s with average power of 120 mW. The results also demonstrate that our FLP-PVP solution consumes 14×-18× less power than an ILP and 5x less power than a hybrid ILP+HWACCs solution.

[1]  Ben H. H. Juurlink,et al.  The SARC Architecture , 2010, IEEE Micro.

[2]  Michael C. Huang,et al.  Efficient data streaming with on-chip accelerators: Opportunities and challenges , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[3]  Victor M. Brea,et al.  SIMD/MIMD Dynamically-Reconfigurable Architecture for High-Performance Embedded Vision Systems , 2012, 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors.

[4]  William J. Dally,et al.  GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[5]  Reiner W. Hartenstein,et al.  A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[6]  Gaurav Agarwal,et al.  “Get smart” with TI’s embedded analytics technology , 2012 .

[7]  Alessandro Forin,et al.  Minimizing partial reconfiguration overhead with fully streaming DMA engines and intelligent ICAP controller (abstract only) , 2010, FPGA '10.

[8]  Scott A. Mahlke,et al.  Bridging the computation gap between programmable processors and hardwired accelerators , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[9]  Jonathan Rose,et al.  Measuring the Gap Between FPGAs and ASICs , 2007, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[10]  Anil Krishna,et al.  Hardware acceleration in the IBM PowerEN processor: architecture and performance , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[11]  Gu-Yeon Wei,et al.  The Accelerator Store framework for high-performance, low-power accelerator-based systems , 2010, IEEE Computer Architecture Letters.

[12]  Jason Cong,et al.  CHARM: a composable heterogeneous accelerator-rich microprocessor , 2012, ISLPED '12.

[13]  Robert A. van de Geijn,et al.  A high-performance, low-power linear algebra core , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.

[14]  Amin Ansari,et al.  Bundled execution of recurring traces for energy-efficient general purpose processing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Sri Parameswaran,et al.  Multi-mode pipelined MPSoCs for streaming applications , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).

[16]  Jason Cong,et al.  Architecture support for accelerator-rich CMPs , 2012, DAC Design Automation Conference 2012.

[17]  Luca Benini,et al.  Platform 2012, a many-core computing accelerator for embedded SoCs: Performance evaluation of visual analytics applications , 2012, DAC Design Automation Conference 2012.

[18]  Karthikeyan Sankaralingam,et al.  Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[19]  Scott A. Mahlke,et al.  Polymorphic Pipeline Array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Steven Swanson,et al.  QSCORES: Trading dark silicon for scalable energy efficiency with quasi-specific cores , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  Gunar Schirner,et al.  Flexible function-level acceleration of embedded vision applications using the Pipelined Vision Processor , 2013, 2013 Asilomar Conference on Signals, Systems and Computers.

[22]  Christoforos E. Kozyrakis,et al.  Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[23]  Liang Tang,et al.  Reconfigurable pipelined coprocessor for multi-mode communication transmission , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[24]  Omesh Tickoo,et al.  HiPPAI: High Performance Portable Accelerator Interface for SoCs , 2009, 2009 International Conference on High Performance Computing (HiPC).

[25]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.

[26]  Scott A. Mahlke,et al.  VEAL: Virtualized Execution Accelerator for Loops , 2008, 2008 International Symposium on Computer Architecture.