Phase distance mapping: a phase-based cache tuning methodology for embedded systems

Networked embedded systems typically leverage a collection of low-power embedded systems (nodes) to collaboratively execute applications spanning diverse application domains (e.g., video, image processing, communication, etc.) with diverse application requirements. The individual networked nodes must operate under stringent constraints (e.g., energy, memory, etc.) and should be specialized to meet varying applications’ requirements in order to adhere to these constraints. Phase-based tuning specializes a system’s tunable parameters to the varying runtime requirements of an application’s different phases of execution to meet optimization goals. Since the design space for tunable systems can be very large, one of the major challenges in phase-based tuning is determining the best configuration for each phase without incurring significant tuning overhead (e.g., energy and/or performance) during design space exploration. In this paper, we propose phase distance mapping, which directly determines the best configuration for a phase, thereby eliminating design space exploration. Phase distance mapping applies the correlation between a known phase’s characteristics and best configuration to determine a new phase’s best configuration based on the new phase’s characteristics. Experimental results verify that our phase distance mapping approach, when applied to cache tuning, determines cache configurations within 1 % of the optimal configurations on average and yields an energy delay product savings of 27 % on average.

[1]  Frank Vahid,et al.  A self-tuning cache architecture for embedded systems , 2004 .

[2]  Ann Gordon-Ross,et al.  Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy , 2008, GLSVLSI '08.

[3]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[4]  Frank Vahid,et al.  A Self-Tuning Configurable Cache , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[5]  Nikil D. Dutt,et al.  Fast Configurable-Cache Tuning With a Unified Second-Level Cache , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[7]  Bill Moyer,et al.  A low power unified cache architecture providing power and performance flexibility , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[8]  Michael C. Huang,et al.  Energy-aware fetch mechanism: trace cache and BTB customization , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[9]  Ann Gordon-Ross,et al.  CPACT - The conditional parameter adjustment cache tuner for dual-core architectures , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[10]  Jia-Guang Sun,et al.  A Phase-Based Self-Tuning Algorithm for Reconfigurable Cache , 2007, First International Conference on the Digital Society (ICDS'07).

[11]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Brad Calder,et al.  Structures for phase classification , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[13]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[14]  Frank Vahid,et al.  A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[15]  Prabhat Mishra,et al.  Intra-Task Dynamic Cache Reconfiguration , 2012, 2012 25th International Conference on VLSI Design.

[16]  Markus Levy,et al.  Measuring Multicore Performance , 2008, Computer.

[17]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[18]  Xuecheng Zou,et al.  Dynamically Reconfigurable Cache for Low-Power Embedded System , 2007, Third International Conference on Natural Computation (ICNC 2007).

[19]  LiangTeh Lee,et al.  A Scheduling with DVS Mechanism for Embedded Multi-Core Real-Time Systems , 2011 .

[20]  Simon Segars Low power design techniques for microprocessors , 2000 .

[21]  Amarsinh Vidhate,et al.  Routing in Delay Tolerant Network , 2016 .

[22]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Ann Gordon-Ross,et al.  A one-shot dynamic optimization methodology for wireless sensor networks , 2010 .

[24]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[25]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[26]  Antonio González,et al.  Energy-effective issue logic , 2001, ISCA 2001.

[27]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[28]  Simha Sethumadhavan,et al.  Multitasking workload scheduling on flexible core chip multiprocessors , 2008, CARN.

[29]  Brad Calder,et al.  Time Varying Behavior of Programs , 1999 .

[30]  Rajeev Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, MICRO 33.

[31]  Brad Calder,et al.  Discovering and Exploiting Program Phases , 2003, IEEE Micro.

[32]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[33]  Lizy Kurian John,et al.  Efficient program scheduling for heterogeneous multi-core processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[34]  Frank Vahid,et al.  A One-Shot Configurable-Cache Tuner for Improved Energy and Performance , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[35]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[37]  David B. Whalley,et al.  Fast, accurate design space exploration of embedded systems memory configurations , 2007, SAC '07.

[38]  Nikil D. Dutt,et al.  Automatic tuning of two-level caches to embedded applications , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[39]  Ann Gordon-Ross,et al.  An application classification guided cache tuning heuristic for multi-core architectures , 2012, 17th Asia and South Pacific Design Automation Conference.

[40]  James E. Smith,et al.  Comparing Program Phase Detection Techniques , 2003, MICRO.

[41]  Andy D. Pimentel,et al.  Towards Efficient Design Space Exploration of Heterogeneous Embedded Media Systems , 2002, Embedded Processor Design Challenges.

[42]  Mehdi Modarressi,et al.  A Reconfigurable Cache Architecture for Object-Oriented Embedded Systems , 2006, 2006 Canadian Conference on Electrical and Computer Engineering.