Exposed Datapath for Efficient Computing

We introduce FlexCore, which is the first exemplar of a processor based on the FlexSoC processor paradigm. The FlexCore utilizes an exposed datapath for increased performance. Microbenchmarks yield a performance boost of a factor of two over a traditional five-stage pipeline with the same functional units as the FlexCore. We describe our approach to compiling for the FlexCore. A flexible interconnect allows the FlexCore datapath to be dynamically reconfigured as a consequence of code generation. Additionally, specialized functional units may be introduced and utilized within the same architecture and compilation framework. The exposed datapath requires a wide control word. The conducted evaluation of two micro benchmarks confirms that this increases the instruction bandwidth and memory footprint. This calls for an efficient instruction decoding as proposed in the FlexSoC paradigm.

[1]  Daniel Gajski,et al.  Utilizing horizontal and vertical parallelism with a no-instruction-set compiler for custom datapaths , 2005, 2005 International Conference on Computer Design.

[2]  Daniel Gajski,et al.  Designing a custom architecture for DCT using NISC technology , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[3]  Gary S. Tyson,et al.  PowerFITS: Reduce Dynamic and Static I-Cache Power Using Application Specific Instruction Set Synthesis , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[4]  Reiner W. Hartenstein,et al.  A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[5]  Gary S. Tyson,et al.  FITS: framework-based instruction-set tuning synthesis for embedded application specific processors , 2004, Proceedings. 41st Design Automation Conference, 2004..

[6]  Gary S. Tyson,et al.  High-quality ISA synthesis for low-power cache designs in embedded microprocessors , 2006, IBM J. Res. Dev..

[7]  Mary Sheeran,et al.  FlexSoC: Combining Flexibility and Efficiency in SoC Designs , 2003 .

[8]  Daniel Gajski,et al.  A cycle-accurate compilation algorithm for custom pipelined datapaths , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[9]  Henry Hoffmann,et al.  Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[10]  Daniel Gajski,et al.  Designing a Custom Architecture for DCT Using NISC Design Flow , 2005 .

[11]  Andreas Moshovos,et al.  CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit , 2000, ISCA '00.

[12]  S. Omid Fatemi,et al.  Multimedia extensions for DLX processor , 2003, 10th IEEE International Conference on Electronics, Circuits and Systems, 2003. ICECS 2003. Proceedings of the 2003.

[13]  Jaehyuk Huh,et al.  TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP , 2004, TACO.

[14]  David A. Patterson,et al.  Computer Organization & Design: The Hardware/Software Interface , 1993 .

[15]  Sangjin Hong,et al.  Dynamic coarse grain dataflow reconfiguration technique for real-time systems design , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[16]  Henk Corporaal,et al.  TTAs: Missing the ILP complexity wall , 1999, J. Syst. Archit..

[17]  Donald J. Patterson,et al.  Computer organization and design: the hardware-software interface (appendix a , 1993 .