Reducing the Energy Cost of Irregular Code Bases in Soft Processor Systems

This paper describes an architecture and FPGA synthesis tool chain for building specialized, energy-saving coprocessors called Irregular Code Energy Reducers (ICERs) for a wide range of unmodified C programs. FPGAs are increasingly used to build large-scale systems, and many large software systems contain relatively little code that is amenable to automatic, semi-automatic, or even manual parallelization. Whereas accelerator approaches have traditionally achieved energy benefits as a side effect from increasing performance via parallel execution, ICERs aim to achieve energy gains even on code with little exploitable parallelism. Traditional approaches to automatically generating accelerators from existing software rely on inferring parallel execution from serial code, so they face the same code analysis challenges as parallelizing compilers. In contrast, because the ICER approach targets energy rather than performance, it easily scales to large, irregular applications that are poor candidates for traditional acceleration. Our results show that, compared to a baseline system with soft processor cores, ICERs can reduce energy consumption by up to 9.5x for the code they target and 2.8x for whole applications.

[1]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[2]  Jason Cong,et al.  AutoPilot: A Platform-Based ESL Synthesis System , 2008 .

[3]  Ralph Wittig,et al.  Performance and power of cache-based reconfigurable computing , 2009, ISCA '09.

[4]  Joseph R. Cavallaro,et al.  Configurable LDPC Decoder Architectures for Regular and Irregular Codes , 2008, J. Signal Process. Syst..

[5]  John Wawrzynek,et al.  The Garp Architecture and C Compiler , 2000, Computer.

[6]  Nachiket Kapre,et al.  Accelerating SPICE Model-Evaluation using FPGAs , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[7]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[8]  Marcus Randall,et al.  FPGA based custom computing machines for irregular problems , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[9]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[10]  Henry Hoffmann,et al.  Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[11]  Russell Tessier,et al.  Application Specific Customization and Scalability of Soft Multiprocessors , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[12]  Philippe Coussy,et al.  High-Level Synthesis: from Algorithm to Digital Circuit , 2008 .

[13]  Frank Vahid,et al.  Warp Processing: Dynamic Translation of Binaries to FPGA Circuits , 2008, Computer.

[14]  Seth Copen Goldstein,et al.  Tartan: evaluating spatial computation for whole program execution , 2006, ASPLOS XII.

[15]  Andreas Moshovos,et al.  CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit , 2000, ISCA '00.

[16]  Scott A. Mahlke,et al.  PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators , 2002, J. VLSI Signal Process..

[17]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[18]  David J. Lau,et al.  Automated Generation of Hardware Accelerators with Direct Memory Access from ANSI/ISO Standard C Functions , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.