Modeling power and energy consumption of dense matrix factorizations on multicore processors

In this paper, we propose a model for the energy consumption of the concurrent execution of three key dense matrix factorizations, with task parallelism leveraged via the Symmetric Multi‐Processing Superscalar (SMPSs) runtime, on a multicore processor. Our model decomposes the power dissipation into the system, static and dynamic components, with the former two being estimated from basic, off‐line experiments. The dynamic power, on the other hand, requires significantly more care, and we introduce a contention‐aware model that accommodates for the variability of power consumption due to memory contention. Experimental results on an Intel Xeon E5504 processor with four cores, using an internal powermeter that samples the power drawn by the mainboard with a frequency of 1 KHz, show the reliability of the energy model for the Cholesky, LU, and QR factorizations on this platform. Copyright © 2013 John Wiley & Sons, Ltd.

[1]  Enrique S. Quintana-Ortí,et al.  Exploiting thread-level parallelism in the iterative solution of sparse linear systems , 2011, Parallel Comput..

[2]  Lizy Kurian John,et al.  Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[3]  Bhavishya Goel Per-core Power Estimation and Power Aware Scheduling Strategies for CMPs , 2011 .

[4]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[5]  Jesús Labarta,et al.  Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications , 2012, 2012 41st International Conference on Parallel Processing.

[6]  Julien Langou,et al.  A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[7]  Robert A. van de Geijn,et al.  Updating an LU Factorization with Pivoting , 2008, TOMS.

[8]  Robert A. van de Geijn,et al.  Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.

[9]  Eduard Ayguadé,et al.  A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs , 2013, IEEE Transactions on Computers.

[10]  Jesús Labarta,et al.  Parallelizing dense and banded linear algebra libraries using SMPSs , 2009, Concurr. Comput. Pract. Exp..

[11]  Enrique S. Quintana-Ortí,et al.  Modeling power and energy of the task-parallel Cholesky factorization on multicore processors , 2012, Computer Science - Research and Development.

[12]  Robert A. van de Geijn,et al.  Parallel out-of-core computation and updating of the QR factorization , 2005, TOMS.