Optimized hardware for suboptimal software: The case for SIMD-aware benchmarks

Evaluation of new architectural proposals against real applications is a necessary step in academic research. However, providing benchmarks that keep up with new architectural changes has become a real challenge. If benchmarks don't cover the most common architectural features, architects may end up under/over estimating the impact of their contributions. In this work, we extend the PARSEC benchmark suite with SIMD capabilities to provide an enhanced evaluation framework for new academic/industry proposals. We then perform a detailed energy and performance evaluation of this commonly used application set on different platforms (Intel® and ARM® processors). Our results show how SIMD code alters scalability, energy efficiency and hardware requirements. Performance and energy efficiency improvements depend greatly on the fraction of code that we can actually vectorize (up to 50×). Our enhancements are based in a custom built wrapper library compatible with SSE, AVX and NEON to facilitate general vectorization. We aim to distribute the source code to reinforce the evaluation process of new proposals for computing systems.

[1]  Lasse Natvig,et al.  Temperature effects on on-chip energy measurements , 2013, 2013 International Green Computing Conference Proceedings.

[2]  Richard Gerber The Software Optimization Cookbook , 2002 .

[3]  Yen-Kuang Chen,et al.  The ALPBench Benchmark Suite for Multimedia Applications , 2005 .

[4]  Pradeep Dubey,et al.  Closing the Ninja Performance Gap through Traditional Programming and Compiler Technology , 2012 .

[5]  Jing Zhang,et al.  OpenCL and the 13 dwarfs: a work in progress , 2012, ICPE '12.

[6]  K. Steinhubl Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .

[7]  Lasse Natvig,et al.  Improving Energy Efficiency through Parallelization and Vectorization on Intel Core i5 and i7 Processors , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[8]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[9]  Andrew A. Chien,et al.  The future of microprocessors , 2011, Commun. ACM.

[10]  Margaret Martonosi,et al.  Challenges in Computer Architecture Evaluation , 2003, Computer.

[11]  Richard Henderson,et al.  Multi-platform auto-vectorization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[12]  Dean M. Tullsen,et al.  The Danger of Interval-Based Power Efficiency Metrics: When Worst Is Best , 2005, IEEE Computer Architecture Letters.

[13]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[14]  W. Marsden I and J , 2012 .

[15]  Shreesha Srinath,et al.  Accelerating a PARSEC Benchmark Using Portable Subword SIMD , 2011 .

[16]  Robert H. Dennard,et al.  Design of ion-implanted MOSFET's with very small physical dimensions , 2007 .

[17]  Hicham Lahlou,et al.  Many-Core Accelerated LIBOR Swaption Portfolio Pricing , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[18]  George Ho,et al.  PAPI: A Portable Interface to Hardware Performance Counters , 1999 .

[19]  Wolfgang E. Nagel,et al.  Flexible workload generation for HPC cluster efficiency benchmarking , 2012, Computer Science - Research and Development.

[20]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[21]  Lasse Natvig,et al.  Case Studies of Multi-core Energy Efficiency in Task Based Programs , 2012, ICT-GLOW.

[22]  Babak Falsafi,et al.  Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware , 2011 .

[23]  Josep Torrellas,et al.  Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[24]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .