As Reconfigurable Computing (RC) closes its sixth decade, significant improvements have been made to make this technology a competitor for application-specific integrated circuits (ASICs). With the field programmable gate array (FPGA) computing power operating significantly lower in speed than that of a general purpose processor (GPP), the developer must exploit every avenue possible to attain a speedup on a heterogeneous computer. Achieveing a significant speedup is what makes the RC application development process worthwhile. The developer may reap the benefits of having better computational power at a lower cost than using a traditional ASIC. This occurs primarily through efforts to pipeline and parallelize processes on an FPGA. In addition to the traditional “three P's,” 1 this paper highlights another speedup avenue via true multilevel parallelism. In particular, it further demonstrates this concept by using a threaded programming model that allows for the GPP and the FPGA to run simultaneously. This method is realized through a threaded dot product on a heterogeneous computer.
[1]
Gerald Estrin,et al.
Organization of Computer Systems-the Fixed Plus Variable Structure Computer
,
1899
.
[2]
Khalid H. Abed,et al.
Design Heuristics for Mapping Floating-Point Scientific Computational Kernels onto High Performance Reconfigurable Computers
,
2009,
J. Comput..
[3]
Viktor K. Prasanna,et al.
Mapping sparse matrix scientific applications onto fpga-augmented reconfigurable supercomputers
,
2006
.
[4]
Robert J. Harrison,et al.
A Pipelined and Parallel Architecture for Quantum Monte Carlo Simulations on FPGAs
,
2010,
VLSI Design.
[5]
Robert J. Harrison,et al.
FPGA acceleration of a quantum Monte Carlo application
,
2008,
Parallel Comput..