OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures

The proliferation of heterogeneous computing platforms presents the parallel computing community with new challenges. One such challenge entails evaluating the efficacy of such parallel architectures and identifying the architectural innovations that ultimately benefit applications. To address this challenge, we need benchmarks that capture the execution patterns (i.e., dwarfs or motifs) of applications, both present and future, in order to guide future hardware design. Furthermore, we desire a common programming model for the benchmarks that facilitates code portability across a wide variety of different processors (e.g., CPU, APU, GPU, FPGA, DSP) and computing environments (e.g., embedded, mobile, desktop, server). As such, we present the latest release of OpenDwarfs, a benchmark suite that currently realizes the Berkeley dwarfs in OpenCL, a vendor-agnostic and open-standard computing language for parallel computing. Using OpenDwarfs, we characterize a diverse set of modern fixed and reconfigurable parallel platforms: multi-core CPUs, discrete and integrated GPUs, Intel Xeon Phi co-processor, as well as a FPGA. We describe the computation and communication patterns exposed by a representative set of dwarfs, obtain relevant profiling data and execution information, and draw conclusions that highlight the complex interplay between dwarfs’ patterns and the underlying hardware architecture of modern parallel platforms.

[1]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[2]  Wu-chun Feng,et al.  On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms , 2013, FCCM 2013.

[3]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  Wu-chun Feng,et al.  On the Programmability and Performance of Heterogeneous Platforms , 2013, 2013 International Conference on Parallel and Distributed Systems.

[5]  F. D. Dinechin,et al.  Custom Arithmetic Datapath Design for FPGAs using the FloPoCo Core Generator , 2011 .

[6]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[7]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[8]  Jing Zhang,et al.  OpenCL and the 13 dwarfs: a work in progress , 2012, ICPE '12.

[9]  Wu-chun Feng,et al.  Performance characterization of data-intensive kernels on AMD Fusion architectures , 2012, Computer Science - Research and Development.

[10]  Wang Zhi-jian Using Benchmarking to Advance Research:A Challenge to Software Engineering , 2005 .

[11]  Wen-mei W. Hwu,et al.  Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .

[12]  Dong Li,et al.  The tradeoffs of fused memory hierarchies in heterogeneous computing architectures , 2012, CF '12.

[13]  Florent de Dinechin,et al.  Designing Custom Arithmetic Data Paths with FloPoCo , 2011, IEEE Design & Test of Computers.

[14]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[15]  Muhsen Owaida,et al.  Synthesis of Platform Architectures from OpenCL Programs , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[16]  SPEC CPU 2006 Benchmark Descriptions , 2006 .

[17]  Wu-chun Feng,et al.  On the characterization of OpenCL dwarfs on fixed and reconfigurable platforms , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[18]  Collin McCurdy,et al.  The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.

[19]  Kurt Keutzer,et al.  A design pattern language for engineering (parallel) software: merging the PLPP and OPL projects , 2010, ParaPLoP '10.

[20]  Wu-chun Feng,et al.  On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.

[21]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.