On the automatic generation of GPU-oriented software applications from RTL IPs

Graphics processing units (GPUs) have been explored as a new computing paradigm for accelerating computation intensive applications. In particular, the combination between GPUs and CPU has proved to be an effective solution for accelerating the software execution, by mixing the few CPU cores optimized for serial processing with many smaller GPU cores designed for massively parallel computations. In addition, sustained by the need of low power consumption besides high performance, a recent trend is combining GPUs and CPU onto a single die (e.g., AMD Fusion, Intel Sandy Bridge, NVIDIA Tegra). The good tradeoff between computing capability and power consumption makes the integrated GPUs a promising alternative for accelerating a wide range of software application for embedded systems. Nevertheless, algorithms must be redesigned to take advantage of these architectures and such a manual parallelization often results in being unsatisfactory. This paper presents a methodology to automatically generate software applications for GPUs, by reusing existing and preverified register-transfer level (RTL) intellectual-properties (IPs). The methodology aims at exploiting the intrinsic parallelism of RTL IPs (such as process concurrency and pipeline micro-architecture) for generating the parallel software implementation of the functionality. The experimental results show how the performance obtained by running the RTL functionality as software applications on GPUs outperform those provided by the RTL code mapped into a hardware accelerator.

[1]  Valeria Bertacco,et al.  GCS: High-performance gate-level simulation with GPGPUs , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[2]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[3]  Valeria Bertacco,et al.  Event-driven gate-level simulation with GP-GPUs , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[4]  Sunil P. Khatri,et al.  Towards acceleration of fault simulation using Graphics Processing Units , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[5]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[6]  Hans-Joachim Wunderlich,et al.  Efficient fault simulation on many-core processors , 2010, Design Automation Conference.

[7]  Franco Fummi,et al.  SAGA: SystemC acceleration on GPU architectures , 2012, DAC Design Automation Conference 2012.

[8]  V. Chaiyakul,et al.  Essential issues for IP reuse , 2000, Proceedings 2000. Design Automation Conference. (IEEE Cat. No.00CH37106).

[9]  Huawei Li,et al.  nGFSIM : A GPU-based fault simulator for 1-to-n detection and its applications , 2010, 2010 IEEE International Test Conference.

[10]  Wu-chun Feng,et al.  On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.

[11]  Daniel D. Gajski,et al.  Embedded tutorial: essential issues for IP reuse , 2000, ASP-DAC '00.

[12]  Alper Sen,et al.  Parallel Cycle Based Logic Simulation Using Graphics Processing Units , 2010, 2010 Ninth International Symposium on Parallel and Distributed Computing.

[13]  Parimala Thulasiraman,et al.  Designing APU Oriented Scientific Computing Applications in OpenCL , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[14]  J. Xu OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .

[15]  Franco Fummi,et al.  FAST-GP: An RTL functional verification framework based on fault simulation on GP-GPUs , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[16]  Sandeep K. Shukla,et al.  SCGPSim: A fast SystemC simulator on GPUs , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).