CIGAR: Application Partitioning for a CPU/Coprocessor Architecture

We present CIGAR, a methodology and development platform that facilitates the use of data-parallel coprocessors. With CIGAR, application developers use profiling tools to identify parts of the application for data-parallel execution, determine the application data structures to be hosted by the coprocessor, prototype coprocessor execution of these parts, and debug correctness of partitioned execution of the application using emulation. The CIGAR methodology is complemented by a CPU/FPGA prototyping platform that runs a fully functional version of the Linux operating system and associated development tools and libraries. To guide the development of our work and to evaluate its utility, we have instrumented SPECint2006 applications to utilize coprocessors emulated by softcore processors embedded in our prototyping platform. Examples of how a developer would use CIGAR to partition an application for a heterogeneous CPU/coprocessor environment are demonstrated.

[1]  Balaram Sinharoy,et al.  POWER5 system microarchitecture , 2005, IBM J. Res. Dev..

[2]  Thomas F. Knight An architecture for mostly functional languages , 1986, LFP '86.

[3]  Martin Hopkins,et al.  Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.

[4]  Stamatis Vassiliadis,et al.  The sum-absolute-difference motion estimation accelerator , 1998, Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204).

[5]  Arthur H. Veen,et al.  Dataflow machine architecture , 1986, CSUR.

[6]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[7]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[8]  Ricardo E. Gonzalez A Software-Configurable Processor Architecture , 2006, IEEE Micro.

[9]  Paul Chow,et al.  An FPGA Processor With Reconfigurable Logic , 1996 .

[10]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[11]  Frank Vahid,et al.  Profiling tools for hardware/software partitioning of embedded applications , 2003 .

[12]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[13]  I. Xilinx,et al.  Virtex-II Pro and Virtex-II Pro X Platform FPGAs: Complete data sheet , 2004 .

[14]  Frank Vahid,et al.  Profiling tools for hardware/software partitioning of embedded applications , 2003, LCTES.

[15]  Kingshuk Karuri,et al.  Fine-grained application source code profiling for ASIP design , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[16]  Paul Chow,et al.  Memory interfacing and instruction specification for reconfigurable processors , 1999, FPGA '99.

[17]  Ralph Wittig,et al.  OneChip: an FPGA processor with reconfigurable logic , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[18]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[19]  Michael Gschwind,et al.  Power and performance optimization at the system level , 2005, CF '05.

[20]  Karthikeyan Sankaralingam,et al.  Universal mechanisms for data-parallel architectures , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[21]  Karthikeyan Sankaralingam,et al.  Universal Mechanisms for Data-Parallel Architectures , 2003, MICRO.

[22]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[23]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[24]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[25]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[26]  Brandon Harris,et al.  Accelerator design for protein sequence HMM search , 2006, ICS '06.