Optimizing application performance: a case study using LMbench

Understanding and improving application performance is useful but difficult. Microbenchmarks provide system-dependent information that we can apply to optimization techniques in order to improve code execution time. LMBench is a microbenchmarking suite that includes Lat_mem_rd, a microbenchmark used to measure cache hierarchy characteristics. We apply varying levels of optimization to matrix multiplication, and study the resulting speedup versus naive code compiled with full compiler optimization. Using general cache optimization techniques, which require no system-dependent knowledge, we obtain a speedup factor of 3.3 on an IA-64 Itanium machine for a 1500x1500 element matrix. Using Lat_mem_rd and a moderate amount of effort to implement blocking, keeping all other things constant, we obtain a speedup of 5. Use of LMBench in this manner can be generally applied to application software to gain performance in critical applications.