Tuning tasks, granularity, and scratchpad size for energy efficiency

The process of co-design involves adapting hardware and software design in combination with some optimization goal in mind. Such a process is considered necessary to overcome the performance and energy efficiency challenges in designing Exascale systems and applications, and ensuring that both scientists and system architects can understand the tradeoffs and implications of their design choices. In this paper we evaluate the energy efficiency for an Exascale strawman architecture for two proxy applications from DoE co-design centers: CoMD and HPGMG. The applications were re-written for the Open Community Runtime (OCR), a programming model and runtime system for Exascale research. Then, for the evaluation we used a functional simulator developed by Intel. Specifically, we investigated code variants and system configurations to explore co-design tradeoffs, gain insight on the interplay between application behavior and memory configurations. We observed that in CoMD, using force symmetry is as effective to reduce energy consumption as it is to reduce the amount of computation needed, even if it requires atomic operations or scheduling support. Reducing the granularity of the tasks has a significant overhead which is greater than the potential benefit of increased locality. For HPGMG, we found that changing the size of the scratchpad made no significant difference in terms of energy consumption, suggesting that the code is for the most part not taking advantage of the local memory; finer blocking should be explored to evaluate the balance between the greater locality and the overhead introduced.

[1]  John Shalf,et al.  Rethinking Hardware-Software Codesign for Exascale Systems , 2011, Computer.

[2]  Samuel Williams,et al.  Optimization of geometric multigrid for emerging multi- and manycore processors , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  R. Hockney,et al.  Quiet high resolution computer models of a plasma , 1974 .

[4]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[5]  Samuel Williams,et al.  Hardware/software co-design for energy-efficient seismic modeling , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[6]  Jens Palsberg,et al.  Concurrent Collections , 2010 .

[7]  Martin Schulz,et al.  Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[8]  Leonid Oliker,et al.  Green Flash: Climate Machine (LBNL) , 2011, Encyclopedia of Parallel Computing.

[9]  Laura Carrington,et al.  An Evaluation of Threaded Models for a Classical MD Proxy Application , 2014, 2014 Hardware-Software Co-Design for High Performance Computing.

[10]  John Shalf,et al.  HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems , 2014 .