Kernel Optimizations and Prefetch with the Spike Executable Optimizer

∗ Luk, Muth, Patil, and Lowney are currently with Intel Corporation, Massachusetts Microprocessor Design Center, Shrewsbury, MA. ABSTRACT Spike is an executable optimizer that uses profile information to place application code for improved fetch efficiency and reduced cache footprint. This placement reduces the number of cache misses and the latencies they add to program execution time. This paper presents extensions of Spike to take advantage of three additional performance opportunities: 1) optimization of the Unix kernel code, 2) prefetching to reduce latencies of long latency loads, and 3) prefetching for loads with predictable strides that are not detected at compile-time.

[1]  Robert S. Cohn,et al.  Optimizing Alpha Executables on Windows NT with Spike , 1998, Digit. Tech. J..

[2]  Bilha Mendelson,et al.  Profile-Directed Restructuring of Operating System Code , 1998, IBM Syst. J..

[3]  Michael D. Smith,et al.  Tracing with Pixie , 1991 .

[4]  Trevor Mudge,et al.  Proceedings of the 24th annual international symposium on Computer architecture , 1997 .

[5]  Karl Pettis,et al.  Profile guided code positioning , 1990, PLDI '90.

[6]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[7]  Robert Muth,et al.  alto: a link‐time optimizer for the Compaq Alpha , 2001 .

[8]  Amitabh Srivastava,et al.  Vulcan Binary transformation in a distributed environment , 2001 .

[9]  Rajiv Gupta,et al.  Predictability of load/store instruction latencies , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[10]  Wei-Chung Hsu,et al.  Data Prefetching On The HP PA-8000 , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[11]  Todd C. Mowry,et al.  Predicting data cache misses in non-numeric applications through correlation profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  Norman Rubin,et al.  Spike: an optimizer for alpha/NT executables , 1997 .

[13]  Koen De Bosschere,et al.  alto: a link-time optimizer for the Compaq Alpha , 2001, Softw. Pract. Exp..

[14]  David Bernstein,et al.  Compiler techniques for data prefetching on the PowerPC , 1995, PACT.

[15]  Michael Burrows,et al.  Efficient and Flexible Value Sampling , 2000, ASPLOS.

[16]  Harish Patil,et al.  Profile-guided post-link stride prefetching , 2002, ICS '02.

[17]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[18]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[19]  David W. Wall,et al.  Link-time optimization of address calculation on a 64-bit architecture , 1994, PLDI '94.

[20]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[21]  Alec Wolman,et al.  Instrumentation and optimization of Win32/intel executables using Etch , 1997 .

[22]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[23]  Craig Partridge,et al.  Improving UNIX kernel performance using profile based optimization , 1994 .

[24]  Eugene Albert,et al.  A transparent method for correlating profiles with source programs , 1999 .

[25]  Rakesh Krishnaiyer,et al.  Optimizing software data prefetches with rotating registers , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.