CAP Bench: a benchmark suite for performance and energy evaluation of low‐power many‐core processors

The constant need for faster and more energy‐efficient processors has been stimulating the development of new architectures, such as low‐power many‐core architectures. Researchers aiming to study these architectures are challenged by peculiar characteristics of some components such as networks‐on‐chip and lack of specific tools to evaluate their performance. In this context, the goal of this paper is to present a benchmark suite to evaluate state‐of‐the‐art low‐power many‐core architectures such as the Kalray MPPA‐256 low‐power processor, which features 256 compute cores in a single chip. The benchmark was designed and used to highlight important aspects and details that need to be considered when developing parallel applications for emerging low‐power many‐core architectures. As a result, this paper demonstrates that the benchmark offers a diverse suite of programs with regard to parallel patterns, job types, communication intensity, and task load strategies suitable for a broad understanding of performance and energy consumption of MPPA‐256 and upcoming many‐core architectures. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[2]  Chaitali Chakrabarti,et al.  Memory exploration for low power, embedded systems , 1999, DAC '99.

[3]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[4]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[5]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Zucheng Zhou,et al.  Implementation and simulation of a cluster-based hierarchical NoC architecture for multi-processor SoC , 2005, IEEE International Symposium on Communications and Information Technology, 2005. ISCIT 2005..

[7]  Tom Drummond,et al.  Fusing points and lines for high performance tracking , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Ulrich Rückert,et al.  A scalable parallel SoC architecture for network processors , 2005, IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05).

[9]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[10]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[11]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[12]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[13]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14]  Philippe Olivier Alexandre Navaux,et al.  NOC architecture design for multi-cluster chips , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[15]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[16]  Nicolas Ventroux,et al.  Hierarchical Network-on-Chip for Embedded Many-Core Architectures , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[17]  Michael McCool,et al.  Structured parallel programming with deterministic patterns , 2010 .

[18]  John Shalf,et al.  Exascale Computing Technology Challenges , 2010, VECPAR.

[19]  Bradford M. Beckmann,et al.  The gem5 simulator , 2011, CARN.

[20]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[21]  Horst D. Simon Barriers to Exascale Computing , 2012, VECPAR.

[22]  Habib Mehrez,et al.  Design for prototyping of a parameterizable cluster-based Multi-Core System-on-Chip on a multi-FPGA board , 2012, 2012 23rd IEEE International Symposium on Rapid System Prototyping (RSP).

[23]  Hiroshi Sasaki,et al.  SMYLEref: A reference architecture for manycore-processor SoCs , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).

[24]  Andres More,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[25]  Barry Wilkinson,et al.  Pattern programming approach for teaching parallel and distributed computing , 2013, SIGCSE '13.

[26]  Massimo Torquati,et al.  Smart Multicore Embedded Systems , 2013 .

[27]  Benoît Dupont de Dinechin,et al.  A clustered manycore processor architecture for embedded and accelerated applications , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[28]  Jean-François Méhaut,et al.  Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application , 2013, IA3 '13.

[29]  Jean-François Méhaut,et al.  Improving the performance of actor model runtime environments on multicore and manycore platforms , 2013, AGERE! 2013.

[30]  Lars Koesterke,et al.  Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi , 2013, 2013 42nd International Conference on Parallel Processing.

[31]  Robert de Simone,et al.  Application Architecture Adequacy through an FFT case study , 2013 .

[32]  Benoît Dupont de Dinechin,et al.  Time-critical computing on a single-chip massively parallel processor , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[33]  Wolfgang Karl,et al.  Evaluation of Adaptive Memory Management Techniques on the Tilera TILE-Gx Platform , 2014, ARCS Workshops.

[34]  Davide Rossi,et al.  Energy efficient parallel computing on the PULP platform with support for OpenMP , 2014, 2014 IEEE 28th Convention of Electrical & Electronics Engineers in Israel (IEEEI).

[35]  Francieli Zanon Boito,et al.  Performance/energy trade-off in scientific computing: the case of ARM big.LITTLE and Intel Sandy Bridge , 2015, IET Comput. Digit. Tech..

[36]  Philippe Olivier Alexandre Navaux,et al.  On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms , 2015, J. Parallel Distributed Comput..