Application signature: a new way to predict application performance

Advances in digital computers have been spectacular but increasingly complex to model. Even the cycle-accurate simulators, which are costly to develop and run have questionable accuracy. This thesis provides a simple, accurate, scientifically proven, and analytic model to accurately predict the performance of real applications. The method creates two profiles as a function of time or problem sizes. The first profile, Hardware Signature, that reveals computer hardware speed, is obtained by running a universal benchmark, HINT or by running an analytical model, AHINT. The second profile, Application Signature (APPMAP), that divulges intrinsic application requirements, can be obtained by four different methods outlined in the thesis. The convolution of these two profiles are used to predict real application performance. The model was tested using 25000 performance measurements and was validated by determining Pearson's correlation, Spearman's rank correlation and maximum deviation from linearity. Furthermore, through the Hardware Signature of the analytical models, one can obtain precise answers to questions about optimum size of memory, caches, and the numerical precision for a given clock rate.

[1]  Wen-Hann Wang,et al.  Efficient trace-driven simulation methods for cache performance analysis , 1991, TOCS.

[2]  Anastasia Pagnoni Stochastic Nets and Performance Evaluation , 1986 .

[3]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[4]  Robert Hundt,et al.  HP Caliper: a framework for performance analysis tools , 2000, IEEE Concurr..

[5]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[6]  Xian-He Sun,et al.  SIZEUP: A New Parallel Performance Metric , 1991, ICPP.

[7]  Albert Y. Zomaya Parallel and Distributed Computing Handbook , 1995 .

[8]  Dharma P. Agrawal,et al.  Performance of multiprocessor interconnection networks , 1989, Computer.

[9]  F. H. Mcmahon,et al.  The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .

[10]  Norman P. Jouppi,et al.  MIPS: A microprocessor architecture , 1982, MICRO 15.

[11]  Rajeev Alur,et al.  Performance Evaluation and Prediction , 1998, Euro-Par.

[12]  Edward S. Davidson,et al.  A performance comparison of the IBM RS/6000 and the Astronautics ZS-1 , 1991, Computer.

[13]  John Gustafson,et al.  The Design of a Scalable, Fixed-Time Computer Benchmark , 1991, J. Parallel Distributed Comput..

[14]  Mark Horowitz,et al.  Cache performance of operating system and multiprogramming workloads , 1988, TOCS.

[15]  Jack M. Holtzman,et al.  An introduction to performance modeling and analysis , 1988, AT&T Technical Journal.

[16]  Amnon Barak,et al.  The MOSIX multicomputer operating system for high performance cluster computing , 1998, Future Gener. Comput. Syst..

[17]  Andrew A. Chien,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, SC.

[18]  Rudolf Eigenmann,et al.  Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[19]  Robert P. Colwell,et al.  Performance effects of architectural complexity in the Intel 432 , 1988, TOCS.

[20]  Mary K. Vernon,et al.  A Generalized Timed Petri Net Model for Performance Analysis , 1985, IEEE Transactions on Software Engineering.

[21]  Sartaj Sahni,et al.  Performance metrics: keeping the focus on runtime , 1996, IEEE Parallel Distributed Technol. Syst. Appl..

[22]  J. Howard Et El,et al.  Scale and performance in a distributed file system , 1988 .

[23]  Cosimo Antonio Prete,et al.  Cache memory design for embedded systems based on program locality analysis , 1997, Proceedings of International Conference on Microelectronic Systems Education.

[24]  Jianping Zhu,et al.  Parallel architectures: Performance prediction: A case study using a scalable shared-virtual-memory machine , 1996 .

[25]  John Flynn,et al.  Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research , 2001 .

[26]  Xian-He Sun,et al.  Scalability of Parallel Algorithm-Machine Combinations , 1994, IEEE Trans. Parallel Distributed Syst..

[27]  Luigi M. Ricciardi,et al.  A hybrid approach to trace generation for performance evaluation of shared-bus multiprocessors , 1996, Proceedings of EUROMICRO 96. 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies.

[28]  Michael Kishinevsky,et al.  Performance Analysis Based on Timing Simulation , 1994, 31st Design Automation Conference.

[29]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[30]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[31]  Arif Ghafoor,et al.  PAWS: a performance evaluation tool for parallel computing systems , 1991, Computer.

[32]  Gene M. Amdahl Limits of Expectation , 1988 .

[33]  Anastasia Pagnoni,et al.  Stochastic Nets and Performance Evaluation , 1986, Advances in Petri Nets.

[34]  Amer Diwan,et al.  Memory system performance of programs with intensive heap allocation , 1995, TOCS.

[35]  Jack J. Dongarra,et al.  The LINPACK Benchmark: An Explanation , 1988, ICS.

[36]  F Baskett,et al.  Microprocessors: From Desktops to Supercomputers , 1993, Science.

[37]  Alan Jay Smith,et al.  Experimental evaluation of on-chip microprocessor cache memories , 1984, ISCA '84.

[38]  Yale N. Patt,et al.  Disk arrays: high-performance, high-reliability storage subsystems , 1994, Computer.

[39]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[40]  Yonathan Bard,et al.  Some Extensions to Multiclass Queueing Network Analysis , 1979, Performance.

[41]  Jozo J. Dujmovic Universal benchmark suites , 1999, MASCOTS '99. Proceedings of the Seventh International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[42]  Rudolf Eigenmann,et al.  Performance evaluation and benchmarking with realistic applications , 2001 .

[43]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[45]  John L. Hennessy,et al.  Big science versus little science—do you have to build it? (panel session) , 1990, ISCA '90.

[46]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[47]  Alan Jay Smith,et al.  Analysis of benchmark characteristics and benchmark performance prediction , 1996, TOCS.

[48]  Robert E. Benner,et al.  A radar simulation program for a 1024-processor hypercube , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[49]  Ali Poursepanj,et al.  The PowerPC performance modeling methodology , 1994, CACM.

[50]  Brian A. Wichmann,et al.  A Synthetic Benchmark , 1976, Comput. J..

[51]  Quinn Snell,et al.  HINT: A new way to measure computer performance , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[52]  David A. Patterson,et al.  A new approach to I/O performance evaluation: self-scaling I/O benchmarks, predicted I/O performance , 1993, SIGMETRICS '93.

[53]  Jack J. Dongarra,et al.  Computer benchmarks , 1993 .

[54]  David H. Bailey,et al.  Twelve ways to fool the masses when giving performance results on parallel computers , 1991 .

[55]  Rajkumar Buyya,et al.  Cluster Computing: A High-Performance Contender , 1999, Computer.

[56]  Jack J. Dongarra,et al.  Performance of various computers using standard linear equations software in a Fortran environment , 1987, SGNM.

[57]  M.T. O'Keefe,et al.  High performance instruction memory design for multiprocessors , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[58]  Alan Jay Smith,et al.  Performance Characterization of Optimizing Compilers , 1992, IEEE Trans. Software Eng..

[59]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[60]  David M. Lucantoni,et al.  The effect of bandwidth management on the performance of a window-based flow control , 1988, AT&T Technical Journal.

[61]  David J. Lilja,et al.  Measuring computer performance : A practitioner's guide , 2000 .

[62]  Rafael H. Saavedra-Barrera,et al.  Machine Characterization and Benchmark Performance Prediction , 1988 .

[63]  D.T. Rover,et al.  Performance Visualization of SLALOM , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.

[64]  Rajkumar Buyya,et al.  Cluster computing: the commodity supercomputer , 1999, Softw. Pract. Exp..

[65]  Peter J. Denning,et al.  Homogeneous Approximations of General Queueing Networks , 1979, Performance.

[66]  Ronald E. Barkley,et al.  A performance study of the UNIX® System V fork system call using CASPER , 1988, AT&T Technical Journal.

[67]  R. Saavedra,et al.  Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times USC-CS-93-546 , 1993 .

[68]  Robert E. Benner,et al.  Development of Parallel Methods for a $1024$-Processor Hypercube , 1988 .

[69]  Reinhold Weicker,et al.  Dhrystone: a synthetic systems programming benchmark , 1984, CACM.

[70]  David A. Patterson,et al.  Storage performance-metrics and benchmarks , 1993 .

[71]  C. V. Ramamoorthy,et al.  Performance Evaluation of Asynchronous Concurrent Systems Using Petri Nets , 1980, IEEE Transactions on Software Engineering.

[72]  Carl Staelin,et al.  lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.

[73]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[74]  Anne Rogers,et al.  The performance impact of incomplete bypassing in processor pipelines , 1995, MICRO 1995.

[75]  Vijay P. Kumar,et al.  Analyzing Scalability of Parallel Algorithms and Architectures , 1994, J. Parallel Distributed Comput..

[76]  Jack Dongarra,et al.  The Parkbench Benchmark Collection , 1995 .

[77]  Sebastien Hily,et al.  Contention on 2nd Level Cache May Limit the Effectiveness of Simultaneous Multithreading , 1997 .

[78]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[79]  Norman P. Jouppi,et al.  Hardware/software tradeoffs for increased performance , 1982, ASPLOS I.

[80]  Mark D. Hill,et al.  What is scalability? , 1990, CARN.

[81]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[82]  Chris R. Jesshope,et al.  Parallel Computers 2: Architecture, Programming and Algorithms , 1981 .

[83]  S. T. Leutenegger,et al.  Distributed computing feasibility in a non-dedicated homogeneous distributed system , 1993, Supercomputing '93.

[84]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1988, TOCS.

[85]  C. Chatfield,et al.  Fourier Analysis of Time Series: An Introduction , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[86]  Duane O. Bowker,et al.  Performance evaluation of variable-bit-rate voice in packet-switched networks , 1988, AT&T Technical Journal.

[87]  Cosimo Antonio Prete,et al.  An approach for investigating design and tuning performance of embedded systems , 1997 .

[88]  Michael D. Rice,et al.  Modeling the Serial and Parallel Fractions of a Parallel Algorithm , 1991, J. Parallel Distributed Comput..

[89]  Pierre Kuonen,et al.  Parallel Computer Architectures for Commodity Computing , 1999 .

[90]  D.A. Reed,et al.  Scalable performance analysis: the Pablo performance analysis environment , 1993, Proceedings of Scalable Parallel Libraries Conference.

[91]  Rafael Hector Saavedra-Barrera,et al.  CPU performance evaluation and execution time prediction using narrow spectrum benchmarking , 1992 .

[92]  A. Gilles,et al.  The Art of Computer Systems Performance Analysis (Techniques for Experimental Design, Measurement, Simulation, and Modeling) , 1992 .

[93]  Alan Jay Smith,et al.  Analysis of the Characteristics of Production Database Workloads and Comparison with the TPC Benchmarks , 1999 .

[94]  David E. Culler,et al.  An Analytical Solution for a Markov Chain Modeling Multithreaded Execution , 1991 .

[95]  Daniel A. Reed,et al.  Performance Instrumentation Techniques for Parallel Systems , 1993, Performance/SIGMETRICS Tutorials.

[96]  Quinn Snell,et al.  An Analytical Model of the HINT Performance Metric , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[97]  WangWen-Hann,et al.  Efficient trace-driven simulation methods for cache performance analysis , 1991 .

[98]  Alan Jay Smith,et al.  Machine Characterization BASed on an Abstract High Level Machine , 1989 .

[99]  Xian-He Sun,et al.  Toward a better parallel performance metric , 1991, Parallel Comput..

[100]  Rafael H. Saavedra-Barrera Performance Prediction by Benchmark and Machine Analysis , 1990 .

[101]  Diomidis Spinellis,et al.  A simulated annealing approach for buffer allocation in reliable production lines , 2000, Ann. Oper. Res..

[102]  Franco P. Preparata Should Amdahl's Law Be Repealed? (Abstract) , 1995, ISAAC.

[103]  Lionel M. Ni,et al.  Scalable Problems and Memory-Bounded Speedup , 1993, J. Parallel Distributed Comput..

[104]  Jeffrey R. Spirn,et al.  Program Behavior: Models and Measurements , 1977 .

[105]  Xian-He Sun Performance Range Comparison via Crossing Point Analysis , 1998, IPPS/SPDP Workshops.

[106]  Luigi M. Ricciardi,et al.  A workload generation environment for trace-driven simulation of shared-bus multiprocessors , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[107]  Xian-He Sun,et al.  Performance prediction of scalable computing: a case study , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[108]  Kishor S. Trivedi,et al.  Real-time systems performance in the presence of failures , 1991, Computer.

[109]  Gregory F. Pfister,et al.  In Search of Clusters , 1995 .

[110]  John L. Hennessy,et al.  The Future of Systems Research , 1999, Computer.

[111]  Vipin Kumar,et al.  Isoefficiency: measuring the scalability of parallel algorithms and architectures , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[112]  Reinhold Weicker,et al.  A detailed look at some popular benchmarks , 1991, Parallel Comput..

[113]  Reinhold Weicker,et al.  An overview of common benchmarks , 1990, Computer.

[114]  Jens Simon,et al.  Workload Analysis of Computation Intensive Tasks: Case Study on SPEC CPU95 Benchmarks , 1997, Euro-Par.

[115]  Anoop Gupta,et al.  Working Sets, Cache Sizes, And Node Granularity Issues For Large-scale Multiprocessors , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[116]  Anthony J. G. Hey,et al.  PARKBENCH: Methodology, Relations and Results , 1996, HPCN Europe.

[117]  Anant Agarwal,et al.  Scalability of parallel machines , 1991, CACM.

[118]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[119]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[120]  Jack J. Dongarra,et al.  PDS: A Performance Database Server , 1994, Sci. Program..

[121]  Wolfgang E. Nagel,et al.  Performance Evaluation and Prediction , 2000, Euro-Par.

[122]  Xian-He Sun,et al.  Shared virtual memory and generalized speedup , 1994, Proceedings of 8th International Parallel Processing Symposium.

[123]  Mark S. Gordon,et al.  General atomic and molecular electronic structure system , 1993, J. Comput. Chem..

[124]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .