Trigeneous Platforms for Energy Efficient Computing of HPC Applications

In this paper, we present two novel real-time heterogeneous platforms with three kinds of devices (CPU, GPU, FPGA), i.e. trigeneous platforms, for efficiently accelerating computation intensive applications in both the high-performance computing and the embedded system domains. In the high-performance computing domain, the entire platform is implemented on a workstation which consists of an Intel Xeon E5 processor, a Nvidia Tesla GPU and a Xilinx Virtex 7 FPGA. The second platform is built for achieving high-performance in the real-time embedded system domain. For this platform, we use a Xilinx Zynq and Nvidia Jetson TK1 board. In these platforms, the communication is performed using PCIe Gen3 and PCIe Gen2 cards respectively. We conducted experiments using 5 real-time and high throughput computation-data-intensive applications, namely cone beam computed tomography, face recognition, HEVC UHD decoding, number plate recognition and motion tracking. All the applications are mapped to the devices of the proposed trigeneous platforms, based on the energy efficiency of the different tasks on each device but also minimizing data transfers and maximizing parallelism. With this trigeneous platform, we are able to achieve an average speed-up of 21x compared to a CPU-GPU platform, 24x compared to a CPU-FPGA platform and 70x compared to Quad-core CPU alone execution. The proposed trigeneous platforms save 43%, 56% and 64% of the energy when compared to CPU-GPU, CPU-FPGA and quad-core CPU platforms respectively. Furthermore, we also implemented these applications by using a single programming language (OpenCL) on the trigeneous platforms and achieved 6x of speed-up on average over the quad-core setup.

[1]  Volodymyr Kindratenko,et al.  QP: A Heterogeneous Multi-Accelerator Cluster , 2011 .

[2]  Adrián Cristal,et al.  An empirical evaluation of High-Level Synthesis languages and tools for database acceleration , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[3]  Alessandro Forin,et al.  Direct GPU/FPGA communication Via PCI express , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[4]  Greg Brown,et al.  A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications , 2012, FPGA '12.

[5]  Eduard Ayguadé,et al.  OmpSs-OpenCL Programming Model for Heterogeneous Systems , 2012, LCPC.

[6]  Yan Ye,et al.  The Scalable Extensions of HEVC for Ultra-High-Definition Video Delivery , 2014, IEEE MultiMedia.

[7]  Youngmin Yi,et al.  Real-time integrated face detection and recognition on embedded GPGPUs , 2014, 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia).

[8]  Er-Wei Bai,et al.  A comparative study on interpolation methods for controlled cardiac CT , 2007, Int. J. Imaging Syst. Technol..

[9]  Osman S. Unsal,et al.  An energy efficient hybrid FPGA-GPU based embedded platform to accelerate face recognition application , 2015, 2015 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XVIII).

[10]  Roberto Carrasco-Alvarez,et al.  Hybrid FPGA/ARM Co-design for Near Real Time of Remote Sensing Imagery , 2014, CIARP.

[11]  A. Farman,et al.  What is cone-beam CT and how does it work? , 2008, Dental clinics of North America.

[12]  Pingfan Meng,et al.  FPGA-GPU-CPU heterogenous architecture for real-time cardiac physiological optical mapping , 2012, 2012 International Conference on Field-Programmable Technology.

[13]  Domingo Mery,et al.  Face Recognition with Local Binary Patterns, Spatial Pyramid Histograms and Naive Bayes Nearest Neighbor Classification , 2009, 2009 International Conference of the Chilean Computer Science Society.

[14]  Xiaojun Zhai,et al.  Automatic number plate recognition system on an ARM-DSP and FPGA heterogeneous SoC platforms , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).

[15]  W. Luk,et al.  Axel: a heterogeneous cluster with FPGAs and GPUs , 2010, FPGA '10.

[16]  Jan Langer,et al.  OmpSs@Zynq all-programmable SoC ecosystem , 2014, FPGA.

[17]  Ra Inta,et al.  The "Chimera": An Off-The-Shelf CPU/GPGPU/FPGA Hybrid Computing Platform , 2012, Int. J. Reconfigurable Comput..

[18]  Xiaojun Zhai,et al.  OCR-based neural network for ANPR , 2012, 2012 IEEE International Conference on Imaging Systems and Techniques Proceedings.

[19]  Osman S. Unsal,et al.  System-level power & energy estimation methodology and optimization techniques for CPU-GPU based mobile platforms , 2014, 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia).

[20]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Steve B. Jiang,et al.  GPU-based fast cone beam CT reconstruction from undersampled and noisy projection data via total variation. , 2010, Medical physics.

[22]  Björn Krüger,et al.  Motion Tracking, Retrieval and 3D Reconstruction from Video , 2014, MUE 2014.

[23]  Osman S. Unsal,et al.  Heterogeneous Platform to Accelerate Compute Intensive Applications , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[24]  Ulrich Brunsmann,et al.  FPGA-GPU architecture for kernel SVM pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[25]  Osman S. Unsal,et al.  ParaDIME: Parallel Distributed Infrastructure for Minimization of Energy for data centers , 2015, Microprocess. Microsystems.

[26]  Kevin Skadron,et al.  Accelerating Compute-Intensive Applications with GPUs and FPGAs , 2008, 2008 Symposium on Application Specific Processors.

[27]  Osman S. Unsal,et al.  VPPET: Virtual platform power and energy estimation tool for heterogeneous MPSoC based FPGA platforms , 2014, 2014 24th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[28]  Yan Han,et al.  Real-time traffic sign recognition based on Zynq FPGA and ARM SoCs , 2014, IEEE International Conference on Electro/Information Technology.