A survey on cache tuning from a power/energy perspective

Low power and/or energy consumption is a requirement not only in embedded systems that run on batteries or have limited cooling capabilities, but also in desktop and mainframes where chips require costly cooling techniques. Since the cache subsystem is typically the most power/energy-consuming subsystem, caches are good candidates for power/energy optimizations, and therefore, cache tuning techniques are widely researched. This survey focuses on state-of-the-art offline static and online dynamic cache tuning techniques and summarizes the techniques' attributes, major challenges, and potential research trends to inspire novel ideas and future research avenues.

[1]  Brad Calder,et al.  Discovering and Exploiting Program Phases , 2003, IEEE Micro.

[2]  Michael Gschwind,et al.  New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors , 2003, IBM J. Res. Dev..

[3]  T. Mudge,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[4]  Frank Vahid,et al.  Dynamic tuning of configurable architectures: the AWW online algorithm , 2008, CODES+ISSS '08.

[5]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[6]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[7]  Rajeev Balasubramonian,et al.  Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[8]  Ann Gordon-Ross,et al.  Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy , 2008, GLSVLSI '08.

[9]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[10]  Brad Calder,et al.  Selecting software phase markers with code structure analysis , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[11]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[12]  Thambipillai Srikanthan,et al.  Profile directed instruction cache tuning for embedded systems , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).

[13]  Mike Alexander,et al.  Thermal management system for high performance PowerPC/sup TM/ microprocessors , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[14]  Michael Franz,et al.  Power reduction techniques for microprocessor systems , 2005, CSUR.

[15]  James E. Smith,et al.  Managing multi-configuration hardware via dynamic working set analysis , 2002, ISCA.

[16]  Daniel P. Siewiorek,et al.  A resource allocation model for QoS management , 1997, Proceedings Real-Time Systems Symposium.

[17]  Intel Corp,et al.  Virtualization Without Direct Execution or Jitting: Designing a Portable Virtual Machine Infrastructure , 2008 .

[18]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[19]  Frank Vahid,et al.  A Self-Tuning Configurable Cache , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[20]  Rajesh K. Gupta,et al.  Adapting cache line size to application behavior , 1999, ICS '99.

[21]  Jianwei Chen,et al.  SimWattch: Integrating Complete-System and User-Level Performance and Power Simulators , 2007, IEEE Micro.

[22]  Margaret Martonosi,et al.  Computer Architecture Techniques for Power-Efficiency , 2008, Computer Architecture Techniques for Power-Efficiency.

[23]  Graham R. Nudd,et al.  Analytical Modeling of Set-Associative Cache Behavior , 1999, IEEE Trans. Computers.

[24]  Gabriel H. Loh,et al.  PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[25]  Mahmut T. Kandemir,et al.  Adaptive set pinning: managing shared caches in chip multiprocessors , 2008, ASPLOS.

[26]  Brad Calder,et al.  Structures for phase classification , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[27]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[28]  Ilya Gluhovsky,et al.  Comprehensive multiprocessor cache miss rate generation using multivariate models , 2005, TOCS.

[29]  Bill Moyer,et al.  A low power unified cache architecture providing power and performance flexibility , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[30]  Koji Inoue,et al.  Trends in High-Performance, Low-Power Cache Memory Architectures( Special Issue on High-Performance and Low-Power Microprocessors) , 2002 .

[31]  Kevin Skadron,et al.  State-preserving vs. non-state-preserving leakage control in caches , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[32]  S. Kim,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[33]  G.S. Sohi Cooperative Caching for Chip Multiprocessors , 2006, ISCA 2006.

[34]  David A. Wood,et al.  ASR: Adaptive Selective Replication for CMP Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[35]  Moinuddin K. Qureshi Adaptive Spill-Receive for robust high-performance caching in CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[36]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[37]  Trevor N. Mudge,et al.  Trace-driven memory simulation: a survey , 1997, CSUR.

[38]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[39]  Chen Ding,et al.  Phase-Based Miss Rate Prediction Across Program Inputs , 2004, LCPC.

[40]  Hyunjin Lee,et al.  CloudCache: Expanding and shrinking private caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[41]  Brad Calder,et al.  Transition phase classification and prediction , 2005, 11th International Symposium on High-Performance Computer Architecture.

[42]  James R. Larus,et al.  Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.

[43]  Lieven Eeckhout,et al.  Evaluating the efficacy of statistical simulation for design space exploration , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[44]  Mahmut T. Kandemir,et al.  Exploiting program hotspots and code sequentiality for instruction cache leakage management , 2003, ISLPED '03.

[45]  Ravi R. Iyer On modeling and analyzing cache hierarchies using CASPER , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..

[46]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[47]  Sangyeun Cho,et al.  Accurately approximating superscalar processor performance from traces , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[48]  Frank Vahid,et al.  A One-Shot Configurable-Cache Tuner for Improved Energy and Performance , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[49]  Babak Falsafi,et al.  Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.

[50]  Tipp Moseley,et al.  A Mathematical Model for Accurately Balancing Co-Phase Effects in Simulated Multithreaded Systems , 2005 .

[51]  Alan Jay Smith,et al.  Efficient (stack) algorithms for analysis of write-back and sector memories , 1989, TOCS.

[52]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[53]  Chaitali Chakrabarti,et al.  Memory Design and Exploration for Low Power, Embedded Systems , 1999 .

[54]  Emilio Luque,et al.  Evaluation of the field-programmable cache: performance and energy consumption , 2006, CF '06.

[55]  Frank Vahid,et al.  A self-tuning cache architecture for embedded systems , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[56]  Per Stenström,et al.  An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[57]  José Ignacio Hidalgo,et al.  Improving SMT performance: an application of genetic algorithms to configure resizable caches , 2009, GECCO '09.

[58]  James E. Smith,et al.  Statistical Simulation: Adding Efficiency to the Computer Designer's Toolbox , 2003, IEEE Micro.

[59]  Frank Vahid,et al.  A table-based method for single-pass cache optimization , 2008, GLSVLSI '08.

[60]  Janak H. Patel,et al.  Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems , 1988, IEEE Trans. Computers.

[61]  Wei Zhang,et al.  Compiler-directed instruction cache leakage optimization , 2002, MICRO.

[62]  Nikil D. Dutt,et al.  Fast Configurable-Cache Tuning With a Unified Second-Level Cache , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[63]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[64]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[65]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[66]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[67]  Jianwei Chen,et al.  SlackSim: a platform for parallel simulations of CMPs on CMPs , 2009, CARN.

[68]  Scott Devine,et al.  Using the SimOS machine simulator to study complex computer systems , 1997, TOMC.

[69]  Chen Ding,et al.  All-window profiling and composable models of cache sharing , 2011, PPoPP '11.

[70]  Yan Meng,et al.  Exploring the limits of leakage power reduction in caches , 2005, TACO.

[71]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[72]  Siddhartha Chatterjee,et al.  Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.

[73]  Zhen Yang,et al.  Modeling and Stack Simulation of CMP Cache Capacity and Accessibility , 2009, IEEE Transactions on Parallel and Distributed Systems.

[74]  Brad Calder,et al.  A co-phase matrix to guide simultaneous multithreading simulation , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[75]  Thomas M. Conte,et al.  Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation , 1998, IEEE Trans. Computers.

[76]  Rajeev Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, MICRO 33.

[77]  Srinivas Devadas,et al.  Dynamic Cache Partitioning via Columnization , 2000, DAC 2000.

[78]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[79]  S. Hsu,et al.  Effectiveness and scaling trends of leakage control techniques for sub-130 nm CMOS technologies , 2003, Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03..

[80]  Ruben W. Castelino,et al.  Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..

[81]  Steven Hsu,et al.  Effectiveness and scaling trends of leakage control techniques for sub-130nm CMOS technologies , 2003, ISLPED '03.

[82]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[83]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[84]  Jichuan Chang,et al.  Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[85]  B. Jacob,et al.  CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .

[86]  Eric Rotenberg,et al.  Adaptive mode control: A static-power-efficient cache design , 2003, TECS.

[87]  Alan D. George,et al.  Parallel simulation of chip-multiprocessor architectures , 2002, TOMC.

[88]  Arijit Ghosh,et al.  Cache optimization for embedded processor cores: An analytical approach , 2004, ACM Trans. Design Autom. Electr. Syst..

[89]  Greg Hamerly,et al.  SimPoint 3.0: Faster and More Flexible Program Analysis , 2005 .

[90]  Jianwei Chen,et al.  SlackSim: a platform for parallel simulations of CMPs on CMPs , 2009, CARN.

[91]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[92]  Christopher J. Hughes,et al.  RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors , 2002, Computer.

[93]  Mahmut T. Kandemir,et al.  Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[94]  Ann Gordon-Ross,et al.  An application classification guided cache tuning heuristic for multi-core architectures , 2012, 17th Asia and South Pacific Design Automation Conference.

[95]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[96]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[97]  Chen Ding,et al.  A Composable Model for Analyzing Locality of Multi-threaded Programs , 2009 .

[98]  Brad Calder,et al.  SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.

[99]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[100]  Sandhya Dwarkadas,et al.  Characterizing and predicting program behavior and its variability , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[101]  Vikas Agarwal,et al.  Static energy reduction techniques for microprocessor caches , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[102]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[103]  Sangyeun Cho,et al.  In-N-Out: Reproducing Out-of-Order Superscalar Processor Behavior from Reduced In-Order Traces , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[104]  Chen Ding,et al.  Miss rate prediction across all program inputs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[105]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[106]  Norman P. Jouppi,et al.  Multi-Core Cache Hierarchies , 2011, Multi-Core Cache Hierarchies.

[107]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[108]  Eby G. Friedman,et al.  Managing static leakage energy in microprocessor functional units , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[109]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[110]  Michael C. Huang,et al.  Positional adaptation of processors: application to energy reduction , 2003, ISCA '03.

[111]  Simon Segars Low power design techniques for microprocessors , 2000 .

[112]  Nikil D. Dutt,et al.  Automatic tuning of two-level caches to embedded applications , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[113]  Glenn Reinman,et al.  Fast and fair: data-stream quality of service , 2005, CASES '05.

[114]  Thomas F. Wenisch,et al.  SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.

[115]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[116]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[117]  K. Mistry,et al.  The High-k Solution , 2007, IEEE Spectrum.

[118]  David Eklov,et al.  Fast modeling of shared caches in multicore systems , 2011, HiPEAC.

[119]  Kevin Skadron,et al.  HotLeakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects , 2003 .

[120]  Bharadwaj S. Amrutur,et al.  Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[121]  Babak Falsafi,et al.  Accurate and complexity-effective spatial pattern prediction , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[122]  Ann Gordon-Ross,et al.  T-SPaCS: a two-level single-pass cache simulation methodology , 2011, ASP-DAC 2011.

[123]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[124]  James E. Smith,et al.  Comparing program phase detection techniques , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[126]  Zhiqiang Wang,et al.  GCSim: A GPU-Based Trace-Driven Simulator for Multi-level Cache , 2009, APPT.

[127]  Lieven Eeckhout,et al.  Deformable Surface 3D Reconstruction from Monocular Images , 2010 .

[128]  Richard E. Kessler,et al.  Page placement algorithms for large real-indexed caches , 1992, TOCS.

[129]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[130]  Frank Vahid,et al.  Configurable cache subsetting for fast cache tuning , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[131]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[132]  S. Abraham,et al.  Eecient Simulation of Multiple Cache Conngurations Using Binomial Trees , 1991 .

[133]  Mendel Rosenblum,et al.  Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.

[134]  David Blaauw,et al.  Drowsy instruction caches: leakage power reduction using dynamic voltage scaling and cache sub-bank prediction , 2002, MICRO.

[135]  Rabin A. Sugumar,et al.  Multi-configuration simulation algorithms for the evaluation of computer architecture designs , 1993 .

[136]  Lieven Eeckhout,et al.  Accurate memory data flow modeling in statistical simulation , 2006, ICS '06.

[137]  Hyunjin Lee,et al.  Two-phase trace-driven simulation (TPTS): a fast multicore processor architecture simulation approach , 2010 .

[138]  Steve Carr,et al.  Reuse-distance-based miss-rate prediction on a per instruction basis , 2004, MSP '04.

[139]  Mahmut T. Kandemir,et al.  Leakage energy management in cache hierarchies , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[140]  Josep Llosa,et al.  A fast and accurate framework to analyze and optimize cache memory behavior , 2004, TOPL.

[141]  Philip Heidelberger,et al.  Parallel trace-driven cache simulation by time partitioning , 1990, 1990 Winter Simulation Conference Proceedings.

[142]  Brad Calder,et al.  Detecting phases in parallel applications on shared memory architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[143]  Gary S. Tyson,et al.  Eager writeback-a technique for improving bandwidth utilization , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[144]  Xi Chen,et al.  Cache contention and application performance prediction for multi-core systems , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[145]  Jörg Henkel,et al.  Instruction Trace Compression for Rapid Instruction Cache Simulation , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[146]  Gordon-RossAnn,et al.  A survey on cache tuning from a power/energy perspective , 2013 .

[147]  Yutao Zhong,et al.  Predicting whole-program locality through reuse distance analysis , 2003, PLDI.

[148]  Ann Gordon-Ross,et al.  T-SPaCS — A two-level single-pass cache simulation methodology , 2011, 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011).

[149]  V. T. Rajan,et al.  Phase Shift Detection: A Problem Classification , 2003 .

[150]  Rajesh K. Gupta,et al.  Phase guided sampling for efficient parallel application simulation , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[151]  Michael L. Scott,et al.  Integrating adaptive on-chip storage structures for reduced dynamic power , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[152]  Hyunjin Lee,et al.  Two‐phase trace‐driven simulation (TPTS): a fast multicore processor architecture simulation approach , 2010, Softw. Pract. Exp..

[153]  Eby G. Friedman,et al.  Managing static leakage energy in microprocessor functional units , 2002, MICRO.

[154]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[155]  Tao Zhang,et al.  MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[156]  Lieven Eeckhout,et al.  Computer Architecture Performance Evaluation Methods , 2010, Computer Architecture Performance Evaluation Methods.

[157]  Tohru Ishihara,et al.  A non-uniform cache architecture for low power system design , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[158]  Sangyeun Cho,et al.  Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[159]  Jaehyuk Huh,et al.  A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.

[160]  Ann Gordon-Ross,et al.  CPACT - The conditional parameter adjustment cache tuner for dual-core architectures , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[161]  Sri Parameswaran,et al.  Finding optimal L1 cache configuration for embedded systems , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[162]  Zhao Zhang,et al.  Enabling software management for multicore caches with a lightweight hardware support , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[163]  Paolo Faraboschi,et al.  An Adaptive Synchronization Technique for Parallel Simulation of Networked Clusters , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[164]  James E. Smith,et al.  Comparing Program Phase Detection Techniques , 2003, MICRO.

[165]  Subramanian Ramaswamy,et al.  Improving cache efficiency via resizing + remapping , 2007, 2007 25th International Conference on Computer Design.

[166]  Tor M. Aamodt,et al.  A first-order fine-grained multithreaded throughput model , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[167]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[168]  Kaushik Roy,et al.  Reducing leakage in a high-performance deep-submicron instruction cache , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[169]  Alejandro Duran,et al.  Trace-driven simulation of multithreaded applications , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[170]  Kaushik Roy,et al.  A forward body-biased low-leakage SRAM cache: device, circuit and architecture considerations , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[171]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[172]  Srihari Makineni,et al.  Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[173]  John L. Hennessy,et al.  The accuracy of trace-driven simulations of multiprocessors , 1993, SIGMETRICS '93.

[174]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[175]  Trevor Mudge,et al.  Drowsy instruction caches. Leakage power reduction using dynamic voltage scaling and cache sub-bank prediction , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..