Time and Energy Performance of Parallel Systems with Hierarchical Memory

In this paper we analyze the impact of memory hierarchies on time-energy trade-off in parallel computations. Contemporary computing systems have deep memory hierarchies with significantly different speeds and power consumptions. This results in nonlinear phenomena in the processing time and energy usage emerging when the size of the computation is growing. In this paper the nonlinear dependence of the time and energy on the size of the solved problem is formalized and verified using measurements in practical computer systems. Then it is applied to formulate a problem of minimum time and minimum energy scheduling parallel processing of divisible loads. Divisible load theory is a scheduling and performance model of data-parallel applications. Mathematical programming is exploited to solve the scheduling problem. A trade-off between energy and schedule length is analyzed and again nonlinear relationships between these two criteria are observed. Further performance analysis reveals that energy consumption and schedule length are ruled by a complex interplay between the costs and speeds of on-core and out-of-core computations, communication delays, and activating new machines.

[1]  Bishwaranjan Bhattacharjee,et al.  Adapting Server Systems for New Memory Technologies , 2014, Computer.

[2]  Maciej Drozdowski,et al.  Scheduling for Parallel Processing , 2009, Computer Communications and Networks.

[3]  David A. Bader,et al.  GPUMemSort: A High Performance Graphic Co-processors Sorting Algorithm for Large Scale In-Memory Data , 2010 .

[4]  H. V. Jagadish,et al.  Partitioning Techniques for Large-Grained Parallelism , 1988, IEEE Trans. Computers.

[5]  Guangyu Sun,et al.  Exploring Memory Hierarchy Design with Emerging Memory Technologies , 2013, Lecture Notes in Electrical Engineering.

[6]  Albert Y. Zomaya,et al.  Survey on Grid Resource Allocation Mechanisms , 2014, Journal of Grid Computing.

[7]  Chi-kin Lee Distributed image processing on a network of workstations , 1994 .

[8]  Pascal Bouvry,et al.  Special issue: Energy-efficiency in large distributed computing architectures , 2014, Future Gener. Comput. Syst..

[9]  Hasso Plattner A Course in In-Memory Data Management , 2013 .

[10]  Karthick Rajamani,et al.  Designing Energy-Efficient Servers and Data Centers , 2010, Computer.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Hasso Plattner,et al.  Changes in Hardware , 2013 .

[13]  Thomas G. Robertazzi,et al.  Optimizing Computing Costs Using Divisible Load Analysis , 1998, IEEE Trans. Parallel Distributed Syst..

[14]  P Kogge,et al.  The tops in flops , 2011, IEEE Spectrum.

[15]  Steven Swanson,et al.  Refactor, Reduce, Recycle: Restructuring the I/O Stack for the Future of Storage , 2013, Computer.

[16]  Yves Robert,et al.  Divisible Load Scheduling , 2009, Introduction to Scheduling.

[17]  Sanguthevar Rajasekaran,et al.  Efficient out-of-core sorting algorithms for the Parallel Disks Model , 2011, J. Parallel Distributed Comput..

[18]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[19]  Dimitrios S. Nikolopoulos,et al.  Runtime and Programming Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory , 2007, Journal of Grid Computing.

[20]  Jeffrey Scott Vitter External memory algorithms , 1998, PODS '98.

[21]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[22]  Thomas G. Robertazzi,et al.  Distributed computation with communication delay (distributed intelligent sensor networks) , 1988 .

[23]  Pawel Wolniewicz,et al.  Out-of-Core Divisible Load Processing , 2003, IEEE Trans. Parallel Distributed Syst..

[24]  Xinxin Mei,et al.  Benchmarking the Memory Hierarchy of Modern GPUs , 2014, NPC.

[25]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[26]  David A. Bader,et al.  GPUMemSort: A High Performance Graphics Co-processors Sorting Algorithm for Large Scale In-Memory Data , 2011 .

[27]  Joanna Berlinska,et al.  Scheduling divisible MapReduce computations , 2011, J. Parallel Distributed Comput..

[28]  Shamsollah Ghanbari,et al.  Comprehensive Review on Divisible Load Theory: Concepts, Strategies, and Approaches , 2014 .

[29]  R.H. Katz,et al.  Tech Titans Building Boom , 2009, IEEE Spectrum.

[30]  Samuel H. Fuller,et al.  Computing Performance: Game Over or Next Level? , 2011, Computer.

[31]  Thomas G. Robertazzi,et al.  Monetary Cost and Energy Use Optimization in Divisible Load Processing , 2004 .

[32]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[33]  Christos Doulkeridis,et al.  A survey of large-scale analytical query processing in MapReduce , 2013, The VLDB Journal.

[34]  Dhabaleswar K. Panda,et al.  Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models , 2014, PGAS.

[35]  Maciej Drozdowski,et al.  Energy trade-offs analysis using equal-energy maps , 2014, Future Gener. Comput. Syst..

[36]  Bharadwaj Veeravalli,et al.  On Handling Large-Scale Polynomial Multiplications in Compute Cloud Environments using Divisible Load Paradigm , 2012, IEEE Transactions on Aerospace and Electronic Systems.

[37]  Pawel Wolniewicz,et al.  Experiments with Scheduling Divisible Tasks in Clusters of Workstations , 2000, Euro-Par.

[38]  Debasish Ghose,et al.  Scheduling Divisible Loads in Parallel and Distributed Systems , 1996 .

[39]  Mohamed Othman,et al.  Cost-Based Multi-QoS Job Scheduling Using Divisible Load Theory in Cloud Computing , 2013, ICCS.

[40]  Thomas G. Robertazzi,et al.  Ten Reasons to Use Divisible Load Theory , 2003, Computer.

[41]  Pascal Bouvry,et al.  Energy-Aware Scheduling on Multicore Heterogeneous Grid Computing Systems , 2013, Journal of Grid Computing.