Exploring the design space of LUT-based transparent accelerators

Instruction set customization accelerates the performance of applications by compressing the length of critical dependence paths and reducing the demands on processor resources. With instruction set customization, specialized accelerators are added to a conventional processor to atomically execute dataflow subgraphs. Accelerators that are exploited without explicit changes to the instruction set architecture of the processor are said to be transparent. Transparent acceleration relies on a light-weight hardware engine to dynamically generate control signals for the accelerator, using subgraphs delineated by a compiler. The design of transparent subgraph accelerators is challenging, as critical subgraphs need to be supported efficiently while maintaining area and timing constraints. Additionally, more complex accelerators require more sophisticated control generation engines. These factors must be carefully balanced. In this work, we investigate the design of subgraph accelerators using configurable lookup table structures. These designs provide an effective paradigm to execute a wide range of subgraphs involving arithmetic and logic operations. We describe why lookup table designs are effective, how they fit into a transparent acceleration framework, and evaluate the effectiveness of a wide range of de-signs using both simulation and logic synthesis.

[1]  James E. Smith,et al.  The performance potential of data dependence speculation and collapsing , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[2]  Peter G. Sassone,et al.  Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[3]  Sanjay J. Patel,et al.  rePLay: A Hardware Framework for Dynamic Optimization , 2001, IEEE Trans. Computers.

[4]  Scott A. Mahlke,et al.  Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[5]  Michael D. Smith,et al.  A high-performance microarchitecture with hardware-programmable functional units , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[6]  Majid Sarrafzadeh,et al.  Instruction generation and regularity extraction for reconfigurable processors , 2002, CASES '02.

[7]  Scott A. Mahlke,et al.  Processor Acceleration Through Automated Instruction Set Customization , 2003, MICRO.

[8]  Ing-jer Huang,et al.  Co-Synthesis of Instruction Sets and Microarchitectures , 1994 .

[9]  Quinn Jacobson,et al.  Instruction pre-processing in trace processors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[10]  Darin Petkov,et al.  Automatic generation of application specific processors , 2003, CASES '03.

[11]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[12]  Michael J. Wirthlin,et al.  DISC: the dynamic instruction set computer , 1995, Optics East.

[13]  Israel Koren Computer arithmetic algorithms , 1993 .

[14]  Srivaths Ravi,et al.  Synthesis of custom processors based on extensible platforms , 2002, ICCAD 2002.

[15]  Scott A. Mahlke,et al.  An architecture framework for transparent instruction set customization in embedded processors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[16]  Brad L. Hutchings,et al.  A dynamic instruction set computer , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[17]  Tulika Mitra,et al.  Characterizing embedded applications for instruction-set extensible processors , 2004, Proceedings. 41st Design Automation Conference, 2004..

[18]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[19]  Stamatis Vassiliadis,et al.  High-Performance 3-1 Interlock Collapsing ALU's , 1994, IEEE Trans. Computers.

[20]  Olivier Temam,et al.  From sequences of dependent instructions to functions: an approach for improving performance without ILP or speculation , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[21]  Paul Chow,et al.  The effect of reconfigurable units in superscalar processors , 2001, FPGA.

[22]  Andreas Moshovos,et al.  CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit , 2000, ISCA '00.

[23]  Harvey F. Silverman,et al.  Processor reconfiguration through instruction-set metamorphosis , 1993, Computer.

[24]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[25]  James E. Smith,et al.  Using dynamic binary translation to fuse dependent instructions , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[26]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[27]  H. T. Kung,et al.  A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.

[28]  Paolo Ienne,et al.  Automatic application-specific instruction-set extensions under microarchitectural constraints , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[29]  Amir Roth,et al.  Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).