Predicting locality phases for dynamic memory optimization

Dynamic data, cache, and memory adaptation can significantly improve program performance when they are applied on long continuous phases of execution that have dynamic but predictable locality. To support phase-based adaptation, this paper defines the concept of locality phases and describes a four-component analysis technique. Locality-based phase detection uses locality analysis and signal processing techniques to identify phases from the data access trace of a program; frequency-based phase marking inserts code markers that mark phases in all executions of the program; phase hierarchy construction identifies the structure of multiple phases; and phase-sequence prediction predicts the phase sequence from program input parameters. The paper shows the accuracy and the granularity of phase and phase-sequence prediction as well as its uses in dynamic data packing, memory remapping, and cache resizing.

[1]  Ken Kennedy,et al.  Improving memory hierarchy performance for irregular applications , 1999, ICS '99.

[2]  Chen Ding,et al.  Miss rate prediction across all program inputs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[3]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[4]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[5]  Chen Ding,et al.  Array regrouping and structure splitting using whole-program reference affinity , 2004, PLDI '04.

[6]  Margaret Martonosi,et al.  Wavelet analysis for microprocessor design: experiences with wavelet-based dI/dt characterization , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[7]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[8]  Sandhya Dwarkadas,et al.  Characterizing and predicting program behavior and its variability , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[9]  J. Larus Whole program paths , 1999, PLDI '99.

[10]  Alan P. Batson,et al.  Measurements of major locality phases in symbolic reference strings , 1976, SIGMETRICS '76.

[11]  Joel H. Saltz,et al.  Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..

[12]  Alan Jay Smith,et al.  Aspects of cache memory and instruction buffer performance , 1987 .

[13]  Keith D. Cooper,et al.  Engineering a Compiler , 2003 .

[14]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[15]  Steve Carr,et al.  Instruction based memory distance analysis and its application to optimization , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[16]  Santosh G. Abraham,et al.  Efficient simulation of caches under optimal replacement with applications to miss characterization , 1993, SIGMETRICS '93.

[17]  Kristof Beyls,et al.  Generating cache hints for improved program efficiency , 2005, J. Syst. Archit..

[18]  Peter F. Sweeney,et al.  Multiple page size modeling and optimization , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[19]  John Cocke,et al.  A program data flow analysis procedure , 1976, CACM.

[20]  Alan Eustace,et al.  ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.

[21]  Ken Kennedy,et al.  Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.

[22]  David A. Padua,et al.  Estimating cache misses and locality using stack distances , 2003, ICS '03.

[23]  Chen Ding,et al.  Regression-Based Multi-Model Prediction of Data Reuse Signature , 2003 .

[24]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[25]  Lieven Eeckhout,et al.  Method-level phase behavior in java workloads , 2004, OOPSLA.

[26]  John M. Mellor-Crummey,et al.  Cross-architecture performance predictions for scientific applications using parameterized models , 2004, SIGMETRICS '04/Performance '04.

[27]  M. Scott,et al.  Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[28]  PaduaDavid,et al.  The LRPD test , 1995 .

[29]  Timothy Sherwood,et al.  Wavelet-based phase classification , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[30]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[31]  Larry Carter,et al.  Compile-time composition of run-time data and iteration reorderings , 2003, PLDI '03.

[32]  Erik R. Altman,et al.  Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques , 2006, PACT 2006.

[33]  Michael C. Huang,et al.  Positional adaptation of processors: application to energy reduction , 2003, ISCA '03.

[34]  John B. Carter,et al.  Efficient remapping mechanisms for an adaptable memory system , 2002 .

[35]  Trishul M. Chilimbi Efficient representations and abstractions for quantifying and exploiting data reference locality , 2001, PLDI '01.

[36]  Yutao Zhong,et al.  Predicting whole-program locality through reuse distance analysis , 2003, PLDI '03.

[37]  Frederica Darema,et al.  Memory access patterns of parallel scientific programs , 1987, SIGMETRICS '87.

[38]  Zhen Fang,et al.  The Impulse Memory Controller , 2001, IEEE Trans. Computers.

[39]  Shikharesh Majumdar,et al.  A measure of program locality and its application , 1984, SIGMETRICS '84.

[40]  R. Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[41]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[42]  Ulrich Kremer,et al.  The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction , 2003, PLDI '03.

[43]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[44]  Paul D. Hovland,et al.  Metrics and models for reordering transformations , 2004, MSP '04.

[45]  Ken Kennedy,et al.  Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings , 2001, International Journal of Parallel Programming.

[46]  Brad Calder,et al.  Selecting software phase markers with code structure analysis , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[47]  Tao Li,et al.  Complexity-based program phase analysis and classification , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[48]  Wei Liu,et al.  EXPERT: expedited simulation exploiting program behavior repetition , 2004, ICS '04.

[49]  Chuan-Qi Zhu,et al.  A Scheme to Enforce Data Dependence on Large Multiprocessor Systems , 1987, IEEE Transactions on Software Engineering.

[50]  Chen Ding,et al.  Characterizing Phases in Service-Oriented Applications , 2004 .

[51]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[52]  Chau-Wen Tseng,et al.  Improving Locality for Adaptive Irregular Scientific Codes , 2000, LCPC.

[53]  Hwansoo Han,et al.  Locality Optimizations For Adaptive Irregular Scientific Codes , 2000 .

[54]  TorrellasJosep,et al.  Positional adaptation of processors , 2003 .