A framework for post-silicon realization of arbitrary instruction extensions on reconfigurable data-paths

In this paper we present a framework for realizing arbitrary instruction set extensions (IE) that are identified post-silicon. The proposed framework has two components viz., an IE synthesis methodology and the architecture of a reconfigurable data-path for realization of the such IEs. The IE synthesis methodology ensures maximal utilization of resources on the reconfigurable data-path. In this context we present the techniques used to realize IEs for applications that demand high throughput or those that must process data streams. The reconfigurable hardware called HyperCell comprises a reconfigurable execution fabric. The fabric is a collection of interconnected compute units. A typical use case of HyperCell is where it acts as a co-processor with a host and accelerates execution of IEs that are defined post-silicon. We demonstrate the effectiveness of our approach by evaluating the performance of some well-known integer kernels that are realized as IEs on HyperCell. Our methodology for realizing IEs through HyperCells permits overlapping of potentially all memory transactions with computations. We show significant improvement in performance for streaming applications over general purpose processor based solutions, by fully pipelining the data-path. (C) 2014 Elsevier B.V. All rights reserved.

[1]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[2]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[3]  Oliver Bastert,et al.  Layered Drawings of Digraphs , 1999, Drawing Graphs.

[4]  Majid Sarrafzadeh,et al.  Instruction generation and regularity extraction for reconfigurable processors , 2002, CASES '02.

[5]  S. K. Nandy,et al.  Streaming FFT on REDEFINE-v2: an application-architecture design space exploration , 2009, CASES '09.

[6]  Darin Petkov,et al.  Automatic generation of application specific processors , 2003, CASES '03.

[7]  Karthikeyan Sankaralingam,et al.  Design, integration and implementation of the DySER hardware accelerator into OpenSPARC , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[8]  Scott Mahlke,et al.  Processor acceleration through automated instruction set customization , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[9]  Albert Wang,et al.  Hardware/software instruction set configurability for system-on-chip processors , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[10]  Janak H. Patel,et al.  Improving the throughput of a pipeline by insertion of delays , 1998, ISCA '98.

[11]  Hamid Noori,et al.  Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization , 2010, The Journal of Supercomputing.

[12]  Michael Bedford Taylor,et al.  Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[13]  S. K. Nandy,et al.  REDEFINE: Runtime reconfigurable polymorphic ASIC , 2009, TECS.

[14]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[15]  Sek M. Chai,et al.  An Architectural Framework for Automated Streaming Kernel Selection , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[16]  Tony M. Brewer,et al.  Instruction Set Innovations for the Convey HC-1 Computer , 2010, IEEE Micro.

[17]  Karthikeyan Sankaralingam,et al.  Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[18]  Ahmed Hemani,et al.  Partially reconfigurable interconnection network for dynamically reprogrammable resource array , 2009, 2009 IEEE 8th International Conference on ASIC.

[19]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[20]  Stamatis Vassiliadis,et al.  The MOLEN polymorphic processor , 2004, IEEE Transactions on Computers.

[21]  Hamid Noori,et al.  ALU-array based reconfigurable accelerator for energy efficient executions , 2009, 2009 International SoC Design Conference (ISOCC).

[22]  Monica S. Lam,et al.  RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .

[23]  Kingshuk Karuri,et al.  A design flow for configurable embedded processors based on optimized instruction set extension synthesis , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[24]  Koen Bertels,et al.  Algorithms for the automatic extension of an instruction-set , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.