Application development on hybrid systems

Hybrid systems consisting of a multitude of different computing device types are interesting targets for high-performance applications. Chip multiprocessors, FPGAs, DSPs, and GPUs can be readily put together into a hybrid system; however, it is not at all clear that one can effectively deploy applications on such a system. Coordinating multiple languages, especially very different languages like hardware and software languages, is awkward and error prone. Additionally, implementing communication mechanisms between different device types unnecessarily increases development time. This is compounded by the fact that the application developer, to be effective, needs performance data about the application early in the design cycle. We describe an application development environment specifically targeted at hybrid systems, supporting data-flow semantics between application kernels deployed on a variety of device types. A specific feature of the development environment is the availability of performance estimates (via simulation) prior to actual deployment on a physical system.

[1]  Patrick Crowley,et al.  Auto-pipe and the X language: a pipeline design tool and description language , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[2]  Frank Vahid,et al.  A quantitative analysis of the speedup factors of FPGAs over processors , 2004, FPGA '04.

[3]  Martin C. Herbordt,et al.  Single Pass, BLAST-Like, Approximate String Matching on FPGAs , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[4]  Dinesh Manocha,et al.  LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[5]  R. Chamberlain,et al.  Achieving Real Data Throughput for an FPGA Co-Processor on Commodity Server Platforms , 2004 .

[6]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementation , 2005, SC.

[7]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[8]  Maya Gokhale,et al.  Metropolitan road traffic simulation on FPGAs , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[9]  Miriam Leeser,et al.  Smart camera based on reconfigurable hardware enables diverse real-time applications , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[10]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementatio , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[11]  Dennis Goeckel,et al.  An adaptive Reed-Solomon errors-and-erasures decoder , 2006, FPGA '06.

[12]  Mark A. Franklin,et al.  A Federated Simulation Environment for Hybrid Systems , 2007, 21st International Workshop on Principles of Advanced and Distributed Simulation (PADS'07).

[13]  Eric J. Tyson Auto-Pipe and the X Language: A Toolset and Language for the Simulation, Analysis, and Synthesis of Heterogeneous Pipelined Architectures, Master's Thesi s, August 2006 , 2006 .

[14]  Pat Hanrahan,et al.  Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.

[15]  Prithviraj Banerjee,et al.  Overview of the FREEDOM compiler for mapping DSP software to FPGAs , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[16]  Viktor K. Prasanna,et al.  Efficient hardware data mining with the Apriori algorithm on FPGAs , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[17]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[18]  Wang Chen,et al.  An FPGA implementation of the two-dimensional finite-difference time-domain (FDTD) algorithm , 2004, FPGA '04.

[19]  W. Hofmann Status of the High Energy Stereoscopic System ( H . E . S . S . ) Project , 2001 .

[20]  Corporate,et al.  PEXlib reference manual , 1992 .

[21]  Simon P. Swordy,et al.  VERITAS: the Very Energetic Radiation Imaging Telescope Array System , 1999 .

[22]  Mark A. Franklin,et al.  An architecture for fast processing of large unstructured data sets , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[23]  Viktor K. Prasanna,et al.  Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.