Reconfigurable Grid Alu Processor: Optimization and Design Space Exploration

Currently few architectural approaches propose new paths to raise the performance of conventional sequential instruction streams in the time of the billions transistor era. Many application programs could profit from processors that are able to speed up the execution of sequential applications beyond the performance of current super scalar processors. The Grid Alu Processor (GAP) is a runtime reconfigurable processor designed for the acceleration of a conventional sequential instruction stream without the need of recompilation. It comprises a super scalar processor front-end, a configuration unit, and an array of reconfigurable functional units (FUs), which is fully integrated into the pipeline. The configuration unit maps data dependent and independent instructions simultaneously at runtime into the array of FUs. This paper evaluates the GAP architecture and optimizes the architecture, the number of FUs, and the configuration layers implemented in the array. The simulations show a significant speed-up for sequential applications on GAP in comparison to an out-of-order super scalar simulator (Simple Scalar). The GAP simulator outperforms Simple Scalar on average by about 50% on the basic architecture and about 100% with an extended version including configuration layers.

[1]  Richard Murphy,et al.  On the Effects of Memory Latency and Bandwidth on Supercomputer Application Performance , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[2]  Michael L. Scott,et al.  Hiding synchronization delays in a GALS processor microarchitecture , 2004, 10th International Symposium on Asynchronous Circuits and Systems, 2004. Proceedings..

[3]  Reiner W. Hartenstein Coarse grain reconfigurable architectures , 2001, Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat. No.01EX455).

[4]  Fadi J. Kurdahi,et al.  Design and Implementation of the MorphoSys Reconfigurable Computing Processor , 2000, J. VLSI Signal Process..

[5]  Lizy Kurian John,et al.  Scaling to the end of silicon with EDGE architectures , 2004, Computer.

[6]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[7]  Diederik Verkest,et al.  Energy aware interconnect exploration of coarse grained reconfigurable processors , 2005 .

[8]  Scott A. Mahlke,et al.  VEAL: Virtualized Execution Accelerator for Loops , 2008, 2008 International Symposium on Computer Architecture.

[9]  Henry Hoffmann,et al.  Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[10]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[11]  N. Ranganathan,et al.  A wire-delay scalable microprocessor architecture for high performance systems , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[12]  Theo Ungerer,et al.  A Two-Dimensional Superscalar Processor Architecture , 2009, 2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns.

[13]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.