A Framework for Qualitative Performance Prediction

Performance prediction models at the source code level are crucial components in advanced optimizing compilers, programming environments, and tools for performance debugging. Compilers and programming environments use performance models to guide the selection of effective code improvement strategies. Tools for performance debugging may use performance prediction models to explain the performance behavior of a program to the user. Finding the best match between a performance prediction model and a specific source–level optimization task or performance explanation task is a challenging problem. The best performance prediction model for a given task is a model that satisfies the precision requirements while including as few performance factors as possible in order to minimize the cost of the performance predictions. In optimizing compilers, the lack of such a cost–effective performance model may make the application of an optimization prohibitively expensive. In the context of a programming environment, marginal performance factors should be avoided since they will obscure reasoning about the observed performance behavior. This paper discusses a new qualitative performance prediction framework at the program source level that automatically selects a minimal set of performance factors for a target system and performance precision requirement. In the context of this paper, a target system consists of a compiler, an operating system, and a machine architecture. The performance prediction framework identifies significant target system and application program parameters that have to be considered in order to achieve the requested precision. Such parameters may include application factors such as number and type of floating point operations, and machine characteristics such as L1 and L2 caches, TLB, and main memory. The reported performance factors can be used by a compiler writer to build or validate a quantitative performance model, and by a user to better understand the observed program performance. In addition, the failure of the framework to produce a model of the desired quality may be an indication that there exists a significant performance factor not considered within the performance framework. Such information is important to guiding a compiler writer or user in a more efficient search for crucial performance factors. Preliminary experimental results for a small computation kernel and a set of twelve target systems indicate the effectiveness of our framework. The target systems for the experiment consisted of four machine architectures (SuperSPARC I-II and UltraSPARC I-II running Solaris 2.5) and three compiler optimization levels (-none, -O3, -depend -fast). Our prototype framework determines different performance models (1) across different precision requirements for the same target ∗e-mail: chunghsu@cs.rutgers.edu, uli@cs.rutgers.edu; address: Department of Computer Science, Hill Center, Busch Campus, Rutgers University, Piscataway, NJ 08855

[1]  Mark J. Clement,et al.  Automated Performance Prediction for Scalable Parallel Computing , 1997, Parallel Comput..

[2]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[3]  Richard E. Kessler,et al.  Page placement algorithms for large real-indexed caches , 1992, TOCS.

[4]  Richard Eugene Kessler Analysis of multi-megabyte secondary CPU cache memories , 1992 .

[5]  Graham R. Nudd,et al.  Predicting the Cache Miss Ratio of Loop-Nested Array References , 1997 .

[6]  William Pugh,et al.  Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.

[7]  Wen-Hann Wang,et al.  On the inclusion properties for multi-level cache hierarchies , 1988, ISCA '88.

[8]  Jacques Cohen,et al.  Computer-assisted microanalysis of programs , 1982, CACM.

[9]  Peter M. A. Sloot,et al.  A simulation methodology for the prediction of SPMD programs performance , 1993 .

[10]  Thomas J. Murray Theoretical and practical aspects of virtual page placement for direct-mapped caches , 1996 .

[11]  Arjan J. C. van Gemund Performance Modeling of Parallel Systems , 1996 .

[12]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[13]  Sivan Toledo,et al.  Quantitative performance modeling of scientific computations and creating locality in numerical algorithms , 1995 .

[14]  Philippe Clauss Counting Solutions to Linear and Nonlinear Constraints Through Ehrhart Polynomials: Applications to Analyze and Transform Scientific Programs , 1996, International Conference on Supercomputing.

[15]  John Paul Shen,et al.  A framework for statistical modeling of superscalar processor performance , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[16]  Michael J. Flynn,et al.  Performance Factors for Superscalar Processors , 1995 .

[17]  Olivier Temam,et al.  Cache interference phenomena , 1994, SIGMETRICS.

[18]  Ken Kennedy,et al.  A static performance estimator to guide data partitioning decisions , 1991, PPOPP '91.

[19]  Michael J. Flynn,et al.  The effect of page allocation on caches , 1992, MICRO.

[20]  Christoph W. Keßler,et al.  Automatic parallelization : new approaches to code generation, data distribution, and performance prediction , 1994 .

[21]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[22]  Jens Simon,et al.  Accurate Performance Prediction for Assively Parallel Systems and Its Applications , 1996, Euro-Par, Vol. II.

[23]  Norman P. Jouppi,et al.  Complexity/performance tradeoffs with non-blocking loads , 1994, ISCA '94.

[24]  Theresa Alexander Performance prediction for loop restructuring optimization , 1993 .

[25]  Todd C. Mowry,et al.  Compiler-directed page coloring for multiprocessors , 1996, ASPLOS VII.

[26]  Ko-Yang Wang Precise compile-time performance prediction for superscalar-based computers , 1994, PLDI '94.

[27]  Vivek Sarkar,et al.  On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.

[28]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[29]  Suresh C. Kothari,et al.  Parametric Micro-level Performance Models for Parallel Computing , 1994 .

[30]  Rafael Hector Saavedra-Barrera,et al.  CPU performance evaluation and execution time prediction using narrow spectrum benchmarking , 1992 .

[31]  Jean-Loup Baer,et al.  Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.

[32]  Sharad Malik,et al.  Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.

[33]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[34]  Kunle Olukotun,et al.  High Performance Cache Architectures to Support Dynamic Superscalar Microprocessors , 1995 .

[35]  Ken Kennedy,et al.  Software methods for improvement of cache performance on supercomputer applications , 1989 .

[36]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[37]  M. Guptay,et al.  Compile-Time Estimation of Communication Costs ofPrograms , 1994 .

[38]  Lin Sun,et al.  Semi-Empirical Multiprocessor Performance Predictions , 1996, J. Parallel Distributed Comput..

[39]  William Jalby,et al.  A Quantitative Algorithm for Data Locality Optimization , 1991, Code Generation.

[40]  Michel Dubois,et al.  Lockup-free Caches in High-Performance Multiprocessors , 1990, J. Parallel Distributed Comput..

[41]  H. Jonkers,et al.  Performance Analysis of Parallel Systems: A Hybrid Approach , 1995 .

[42]  John Paul Shen,et al.  Theoretical modeling of superscalar processor performance , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[43]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[44]  Erich Strohmaier Statistical Performance Modeling: Case Study of the NPB 2.1 Results , 1997, Euro-Par.

[45]  Thomas Anderson Operating system support for high-performance multiprocessing , 1992 .

[46]  Dieter Müller-Wichards Performance estimates for applications: an algebraic framework , 1988, Parallel Comput..

[47]  Phillip Ein-Dor,et al.  Attributes of the performance of central processing units: a relative performance prediction model , 1987, CACM.

[48]  Evgenia Smirni,et al.  PerPreT - A Performance Prediction Tool for Massive Parallel Sysytems , 1995, MMB.

[49]  Larry Carter,et al.  A Compiler Perspective on Architectural Evolutions , 1997 .

[50]  Louis Vuurpijl,et al.  A Scalable Performance Prediction Method for Parallel Neural Network Simulations , 1994, HPCN.

[51]  Mahmut T. Kandemir,et al.  Changing Interaction of Compiler and Architecture , 1997, Computer.

[52]  R.J. Block,et al.  Automated Performance Prediction of Message-Passing Parallel Programs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[53]  Mark Crovella,et al.  Performance Prediction and Tuning of Parallel Programs , 1994 .