Instruction based memory distance analysis and its application to optimization

Feedback-directed optimization has become an increasingly important tool in designing and building optimizing compilers as it provides a means to analyze complex program behavior that is not possible using traditional static analysis. Feedback-directed optimization offers the compiler opportunities to analyze and optimize the memory behavior of programs even when traditional array-based analysis is not applicable. As a result, both floating point and integer programs can benefit from memory hierarchy optimization. In this paper, we examine the notion of memory distance as it is applied to the instruction space of a program and to feedback-directed optimization. Memory distance is defined as a dynamic quantifiable distance in terms of memory references between two accesses to the same memory location. We use memory distance to predict the miss rates of instructions in a program. Using the miss rates, we then identify the program's critical instructions - the set of high miss instructions whose cumulative misses account for 95% of the L2 cache misses in the program - in both integer and floating-point programs. Our experiments show that memory-distance analysis can effectively identify critical instructions in both integer and floating-point programs. Additionally, we apply memory-distance analysis to memory disambiguation in out-of-order issue processors using those distances to determine when a load may be speculated ahead of a preceding store. Our experiments show that memory-distance-based disambiguation on average achieves within 5-10% of the performance gain of the store set technique which requires a hardware table.

[1]  Andreas Moshovos,et al.  Memory dependence prediction , 1998 .

[2]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[3]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[4]  Chen Ding,et al.  Array regrouping and structure splitting using whole-program reference affinity , 2004, PLDI '04.

[5]  Margaret Martonosi,et al.  Informing memory operations: memory performance feedback mechanisms and their applications , 1998, TOCS.

[6]  Wei-Chung Hsu,et al.  On the predictability of program behavior using different input data sets , 2002, Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures.

[7]  Weng-Fai Wong,et al.  Static identification of delinquent loads , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[8]  Olivier Temam,et al.  Quantifying loop nest locality using SPEC'95 and the perfect benchmarks , 1999, TOCS.

[9]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[10]  Siddhartha Chatterjee,et al.  Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.

[11]  Kristof Beyls,et al.  Reuse Distance-Based Cache Hint Selection , 2002, Euro-Par.

[12]  John M. Mellor-Crummey,et al.  Cross-architecture performance predictions for scientific applications using parameterized models , 2004, SIGMETRICS '04/Performance '04.

[13]  James P. LeBlanc,et al.  Apparatus to dynamically control the out-of-order execution of load-store instructions , 1995 .

[14]  Yutao Zhong,et al.  Predicting whole-program locality through reuse distance analysis , 2003, PLDI.

[15]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[16]  S. Onder Cost effective memory dependence prediction using speculation levels and color sets , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[17]  Santosh G. Abraham,et al.  Efficient simulation of caches under optimal replacement with applications to miss characterization , 1993, SIGMETRICS '93.

[18]  Sharad Malik,et al.  Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.

[19]  Steve Carr,et al.  Reuse-distance-based miss-rate prediction on a per instruction basis , 2004, MSP '04.

[20]  Rajiv Gupta,et al.  Automatic generation of microarchitecture simulators , 1998, Proceedings of the 1998 International Conference on Computer Languages (Cat. No.98CB36225).

[22]  Mellor-CrummeyJohn,et al.  Cross-architecture performance predictions for scientific applications using parameterized models , 2004 .

[23]  Rajiv Gupta,et al.  Dynamic memory disambiguation in the presence of out-of-order store issuing , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[24]  Chen Ding,et al.  Miss rate prediction across all program inputs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.