REDSPY: Exploring Value Locality in Software

Complex code bases with several layers of abstractions have abundant inefficiencies that affect the execution time. Value redundancy is a kind of inefficiency where the same values are repeatedly computed, stored, or retrieved over the course of execution. Not all redundancies can be easily detected or eliminated with compiler optimization passes due to the inherent limitations of the static analysis. Microscopic observation of whole executions at instruction- and operand-level granularity breaks down abstractions and helps recognize redundancies that masquerade in complex programs. We have developed REDSPY---a fine-grained profiler to pinpoint and quantify redundant operations in program executions. Value redundancy may happen over time at same locations or in adjacent locations, and thus it has temporal and spatial locality. REDSPY identifies both temporal and spatial value locality. Furthermore, REDSPY is capable of identifying values that are approximately the same, enabling optimization opportunities in HPC codes that often use floating point computations. REDSPY provides intuitive optimization guidance by apportioning redundancies to their provenance---source lines and execution calling contexts. REDSPY pinpointed dramatically high volume of redundancies in programs that were optimization targets for decades, such as SPEC CPU2006 suite, Rodinia benchmark, and NWChem---a production computational chemistry code. Guided by REDSPY, we were able to eliminate redundancies that resulted in significant speedups.

[1]  Brad Calder,et al.  Value Profiling and Optimization , 1999, J. Instr. Level Parallelism.

[2]  Mikko H. Lipasti,et al.  On the value locality of store instructions , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[3]  Mark N. Wegman,et al.  Constant propagation with conditional branches , 1985, POPL.

[4]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[5]  David I. August,et al.  Practical automatic loop specialization , 2013, ASPLOS '13.

[6]  Mikko H. Lipasti,et al.  Characterization of silent stores , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[7]  Keith D. Cooper,et al.  Value Numbering , 1997, Softw. Pract. Exp..

[8]  M. Wegman,et al.  Global value numbers and redundant computations , 1988, POPL '88.

[9]  Mario Badr,et al.  Load Value Approximation , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[10]  Natalie D. Enright Jerger,et al.  Doppelgänger: A cache for approximate computing , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Brad Calder,et al.  Value profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[13]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[14]  Steven J. Deitz,et al.  Eliminating redundancies in sum-of-product array computations , 2001, ICS '01.

[15]  Guangming Tan,et al.  Optimizing stencil code via locality of computation , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[16]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[17]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[18]  Jack Mostow,et al.  Automating Program Speedup by Deciding What to Cache , 1985, IJCAI.

[19]  Mary F. Fernández,et al.  Simple and effective link-time optimization of Modula-3 programs , 1995, PLDI '95.

[20]  Luiz De Rose,et al.  Cray Performance Analysis Tools , 2008, Parallel Tools Workshop.

[21]  Shasha Wen,et al.  Runtime Value Numbering: A Profiling Technique to Pinpoint Redundant Computations , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[22]  Wentao Chang,et al.  Sampling-based program locality approximation , 2008, ISMM '08.

[23]  Hidehiko Masuhara,et al.  A Value Profiler for Assisting Object-Oriented Program Specialization , 2004 .

[24]  Mehdi Amini,et al.  ThinLTO: Scalable and incremental LTO , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[25]  Luca Benini,et al.  Energy Efficient Source Code Transformation based on Value Profiling , 2000 .

[26]  Mikko H. Lipasti,et al.  Silent stores for free , 2000, MICRO 33.

[27]  Peter Feller,et al.  Value Profiling for Instructions and Memory Locations , 1998 .

[28]  John M. Mellor-Crummey,et al.  Call Paths for Pin Tools , 2014, CGO '14.

[29]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[30]  Scott A. Mahlke,et al.  Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.

[31]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[32]  Easwaran Raman,et al.  MAO — An extensible micro-architectural optimizer , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[33]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[34]  Saumya K. Debray,et al.  Code Specialization Based on Value Profiles , 2000, SAS.

[35]  W. E. Weihl,et al.  Efficient and flexible value sampling , 2000, SIGP.

[36]  Saumya K. Debray,et al.  Goal-Directed Value Profiling , 2001, CC.

[37]  Ken Kennedy,et al.  Redundancy elimination revisited , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[38]  Gurindar S. Sohi,et al.  Dynamic dead-instruction detection and elimination , 2002, ASPLOS X.

[39]  Onur Mutlu,et al.  RFVP: Rollback-Free Value Prediction with Safe-to-Approximate Loads , 2016, ACM Trans. Archit. Code Optim..

[40]  John M. Mellor-Crummey,et al.  DeadSpy: a tool to pinpoint program inefficiencies , 2012, CGO '12.