Program Scalability Analysis for HPC Cloud: Applying Amdahl's Law to NAS Benchmarks

The availability of high performance computing (HPC) clouds requires scalability analysis of parallel programs for multiple different environments in order to maximize the promised economic benefits. Unlike traditional HPC application performance studies that aim to predict performances of like-kind processors, this paper reports an instrumentation assisted complexity analysis method based on Amdahl's Law framework for program scalability analysis for different HPC environments. We show that program instrumentation helps Gustafson's scaled speedup formulation to quantify the elusive quality in Amdahl's Law. We report that without separating communication time from computing, prediction results are not trustworthy. We demonstrate a methodology that can transform asymptotic complexity models to timing models in order to separate communication time and to identify the optimal degree of parallelism. A traditional HPC cluster and a private HPC cloud are used to validate the proposed methodology by showing the feasibility of optimal parallel processing and by scalability analysis of five NAS benchmarks. Our results show that either cloud or cluster can be effectively exploited if the application can adapt to changing processing conditions dynamically. As we dig deeper into the performance analysis myths, “scalability limit” seems to mean less than its common interpretation but more on the inadequacy our programming habits and architecture support.

[1]  Emmanuel Jeannot,et al.  Fast and Efficient Total Exchange on Two Clusters , 2007, Euro-Par.

[2]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[3]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[4]  Vipin Kumar,et al.  Analysis of scalability of parallel algorithms and architectures: a survey , 1991, ICS '91.

[5]  Jesús Labarta,et al.  A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[6]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[7]  Jon Crowcroft,et al.  FutureGRID: A Program for long-term research into GRID systems architecture , 2008 .

[8]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[9]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[10]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[11]  Jack B. Dennis,et al.  Data Flow Supercomputers , 1980, Computer.

[12]  Anant Agarwal,et al.  Scalability of parallel machines , 1991, CACM.

[13]  Scott Pakin,et al.  Entering the petaflop era: the architecture and performance of Roadrunner , 2008, HiPC 2008.

[14]  Richard M. Karp,et al.  A Survey of Parallel Algorithms for Shared-Memory Machines , 1988 .

[15]  Jie Wu,et al.  Tuple switching network - When slower may be better , 2012, J. Parallel Distributed Comput..

[16]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[17]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[18]  Abdallah Khreishah,et al.  SpotMPI: A Framework for Auction-Based HPC Computing Using Amazon Spot Instances , 2011, ICA3PP.

[19]  John L. Gustafson Amdahl's Law , 2011, Encyclopedia of Parallel Computing.

[20]  Adolfy Hoisie,et al.  A comparison between the Earth Simulator and AlphaServer systems using predictive application performance models , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[21]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[22]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[23]  Yuan Shi,et al.  Timing Models and Local Stopping Criteria for Asynchronous Iterative Algorithms , 1999, J. Parallel Distributed Comput..

[24]  Lydia Kronsjö Computational complexity of sequential and parallel algorithms , 1986, Wiley series in computing.

[25]  Allan Gottlieb,et al.  Highly parallel computing , 1989, Benjamin/Cummings Series in computer science and engineering.

[26]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[27]  Yuan Shi A distributed programming model and its applications to computation intensive problems for heterogeneous environments , 2008 .

[28]  Robert L. Grossman,et al.  Sector and Sphere: the design and implementation of a high-performance data cloud , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[29]  Michael J. Flynn,et al.  Very high-speed computing systems , 1966 .

[30]  Kai Hwang,et al.  Advanced computer architecture - parallelism, scalability, programmability , 1992 .

[31]  Ivan Dimov,et al.  Advances in Parallel Algorithms , 1994 .

[32]  Sans Pareil,et al.  Amdahl's law , 2005, International Journal of Parallel Programming.