A Systematic Design Space Exploration Approach to Customising Multi-Processor Architectures: Exemplified Using Graphics Processors

A systematic approach to customising Homogeneous Multi-Processor (HoMP) architectures is described. The approach involves a novel design space exploration tool and a parameterisable system model. Post-fabrication customisation options for using reconfigurable logic with a HoMP are classified. The adoption of the approach in exploring pre- and post-fabrication customisation options to optimise an architecture's critical paths is then described. The approach and steps are demonstrated using the architecture of a graphics processor. We also analyse on-chip and off-chip memory access for systems with one or more processing elements (PEs), and study the impact of the number of threads per PE on the amount of off-chip memory access and the number of cycles for each output. It is shown that post-fabrication customisation of a graphics processor can provide up to four times performance improvement for negligible area cost.

[1]  Xinwei Xue,et al.  Acceleration of fluoro-CT reconstruction for a mobile C-arm on GPU and FPGA hardware: a simulation study , 2006, SPIE Medical Imaging.

[2]  Axel Braun,et al.  SystemC for the Design and Modeling of Programmable Systems , 2004, FPL.

[3]  Wayne Luk,et al.  Systematic design space exploration for customisable multi-processor architectures , 2008, 2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation.

[4]  Jung Ho Ahn,et al.  The Design Space of Data-Parallel Memory Systems , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[5]  Eric J. Kelmelis,et al.  High-performance computing with desktop workstations , 2006 .

[6]  G. Diamos,et al.  FROM ADAPTIVE TO SELF-TUNED SYSTEMS , 2007 .

[7]  Carlos González,et al.  Shader performance analysis on a modern GPU architecture , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[8]  Marco Platzner,et al.  Field Programmable Logic and Application , 2004, Lecture Notes in Computer Science.

[9]  Wayne Luk,et al.  Reconfigurable computing: architectures and design methods , 2005 .

[10]  Vincent Nollet,et al.  A Quick Safari Through the MPSoC Run-Time Management Jungle , 2007, 2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia.

[11]  Laurent Moll,et al.  Sepia: scalable 3D compositing using PCI Pamette , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[12]  Andreas Gerstlauer,et al.  Automatic generation of transaction level models for rapid design space exploration , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[13]  Frank Vahid,et al.  Dynamic Partial FPGA Reconfiguration in a Prototype Microprocessor System , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[14]  Wayne Luk,et al.  Evaluation of SystemC modelling of reconfigurable embedded systems , 2005, Design, Automation and Test in Europe.

[15]  Michael F. P. O'Boyle,et al.  High-Performance Embedded Architecture and Compilation Roadmap , 2007, Trans. High Perform. Embed. Archit. Compil..

[16]  N.K. Govindaraju,et al.  A Memory Model for Scientific Algorithms on Graphics Processors , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[17]  Michael Manzke,et al.  A scalable and reconfigurable shared-memory graphics architecture , 2006, SIGGRAPH '06.

[18]  Patrick Schaumont,et al.  The happy marriage of architecture and application in next-generation reconfigurable systems , 2004, CF '04.

[19]  Ed F. Deprettere,et al.  A Methodology for Architecture Exploration of Heterogeneous Signal Processing Systems , 2001, J. VLSI Signal Process..

[20]  Ed F. Deprettere,et al.  An Approach for Quantitative Analysis of Application-Specific Dataflow Architectures , 1997, ASAP.

[21]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[22]  Wayne Luk,et al.  Using Reconfigurable Logic to Optimise GPU Memory Accesses , 2008, 2008 Design, Automation and Test in Europe.

[23]  Tien-Fu Chen,et al.  Flexible Heterogeneous Multicore Architectures for Versatile Media Processing Via Customized Long Instruction Words , 2005, IEEE Trans. Circuits Syst. Video Technol..

[24]  Wayne Luk,et al.  Bridging the Gap between FPGAs and Multi-Processor Architectures: A Video Processing Perspective , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[25]  Marc Tremblay,et al.  A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC® Processor , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[26]  Wayne Luk,et al.  Exploring Reconfigurable Architectures for Binomial-Tree Pricing Models , 2008, ARC.

[27]  Kevin Skadron,et al.  Fine-grained graphics architectural simulation with Qsilver , 2005, SIGGRAPH '05.

[28]  Kevin Skadron,et al.  Applications of Small-Scale Reconfigurability to Graphics Processors , 2006, ARC.