Calculating stack distances efficiently

This paper1 describes our experience using the stack processing algorithm [6] for estimating the number of cache misses in scientific programs. By using a new data structure and various optimization techniques we obtain instrumented run-times within 50 to 100 times the original optimized run-times of our benchmarks.