A Study of Energy and Locality Effects Using Space-Filling Curves

The cost of energy is becoming an increasingly important driver for the operating cost of HPC systems, adding yet another facet to the challenge of producing efficient code. In this paper, we investigate the energy implications of trading computation for locality by applying Hilbert and Morton space-filling curves to dense matrix-matrix multiplication. The advantage of these curves is that they exhibit an inherent tiling effect without requiring specific architecture tuning. By accessing the matrices in the order determined by the space-filling curves, we can trade computation for locality. The index computation overhead of the Morton curve is found to be balanced against its locality and energy efficiency, while the overhead of the Hilbert curve outweighs its improvements on our test system.

[1]  Michael Mills,et al.  Fractal based image coding scheme using Peano scan , 1988, 1988., IEEE International Symposium on Circuits and Systems.

[2]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[3]  Simon J. Hollis,et al.  Identifying Compiler Options to Minimize Energy Consumption for Embedded Platforms , 2013, Comput. J..

[4]  Efraim Rotem,et al.  Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge , 2012, IEEE Micro.

[5]  Rajeev Raman,et al.  Converting to and from Dilated Integers , 2008, IEEE Transactions on Computers.

[6]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[7]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[8]  Dean M. Tullsen,et al.  The effect of compiler optimizations on Pentium 4 power consumption , 2003, Seventh Workshop on Interaction Between Compilers and Computer Architectures, 2003. INTERACT-7 2003. Proceedings..

[9]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[10]  Wu-chun Feng,et al.  The Green500 List: Encouraging Sustainable Supercomputing , 2007, Computer.

[11]  Stephen L. Olivier,et al.  Power Measurement and Concurrency Throttling for Energy Reduction in OpenMP Programs , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[12]  Michael Bader,et al.  Cache oblivious matrix multiplication using an element ordering based on the Peano curve , 2006 .

[13]  Alexander Heinecke,et al.  Parallel matrix multiplication based on space-filling curves on shared memory multicore platforms , 2008, MAW '08.

[14]  Ahmad C. Ansari,et al.  Image Data Ordering And Compression Using Peano Scan And LOT , 1992 .

[15]  Tomofumi Yuki,et al.  Folklore Confirmed: Compiling for Speed = Compiling for Energy , 2013, LCPC.

[16]  Rong Ge,et al.  CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[17]  David E. Culler,et al.  Power Optimization - a Reality Check , 2009 .

[18]  R. J. Stevens,et al.  Manipulation and Presentation of Multidimensional Image Data Using the Peano Scan , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Neal,et al.  Using Peano Curves for Bilevel Display of Continuous-Tone Images , 1982, IEEE Computer Graphics and Applications.

[20]  Lizy Kurian John,et al.  Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[21]  Michael Bader Exploiting the Locality Properties of Peano Curves for Parallel Matrix Multiplication , 2008, Euro-Par.

[22]  Michael Bader,et al.  Space-Filling Curves - An Introduction with Applications in Scientific Computing , 2012, Texts in Computational Science and Engineering.

[23]  Leo Stocco,et al.  Integer dilation and contraction for quadtrees and octrees , 1995, IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing. Proceedings.

[24]  S. Huang,et al.  Energy-Efficient Cluster Computing via Accurate Workload Characterization , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[25]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[26]  Shuaiwen Song,et al.  Energy Profiling and Analysis of the HPC Challenge Benchmarks , 2009, Int. J. High Perform. Comput. Appl..

[27]  Klaus-Dieter Lange,et al.  ASSESSING TRENDS OVER TIME IN PERFORMANCE , COSTS , AND ENERGY USE FOR SERVERS , 2009 .

[28]  A. Heinecke,et al.  Cache Oblivious Dense and Sparse Matrix Multiplication Based on Peano Curves , 2008 .

[29]  G. Peano Sur une courbe, qui remplit toute une aire plane , 1890 .

[30]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .