Coprocessor Acceleration for Domain-Specific Computing

There is a growing trend to use coprocessors to offload and accelerate domain-specific applications in order to obtain significant performance improvement and energy/power reductions. Two important coprocessor components in the heterogeneous system are the GPU and F-PGA. GPU (graphics processing unit) is increasingly used as a data-parallel coprocessor for general computations. The newest GPU has a much larger number of cores (compared to CPU) and very high peak FLOPS. FPGA (field programmable gate array), on the other hand, allows users to customize, at fine-grain level, the computational data path and memory hierarchy according to the exact need of the applications. FPGA excels in integer operations and bit-level operations. The thesis starts with several coprocessor acceleration examples for our focus application domains: the first domain is on VLSICAD algorithms and the second is on computational medical imaging. We detail application acceleration examples in the domains including lithography simulation for IC manufacturing, medical image reconstruction using compressive sensing, and medical image registration using fluid models. Both GPU-accelerated versions and FPGA-accelerated versions have been implemented. Based on these implementations, we then analyze the performance and energy trade-offs, the interaction between the diverse application requirements and a spectrum of hardware systems, and how those domain-specific coprocessor acceleration case studies further bring us insights for domain-specific architecture innovations. In the end, we showcase an example for collaborative execution on the heterogeneous platform. Different scheduling policies are needed to optimize performance or energy. The thesis concludes as we present reusable architecture templates and realizations for futuristic accelerator-rich CMPs.