论文信息 - Leveraging on Deep Memory Hierarchies to Minimize Energy Consumption and Data Access Latency on Single-Chip Cloud Computers

Leveraging on Deep Memory Hierarchies to Minimize Energy Consumption and Data Access Latency on Single-Chip Cloud Computers

Recent advances in chip design and integration technologies have led to the development of Single-Chip Cloud computers which are a microcosm of cloud datacenters. Those computers are based on Network-on-Chip (NoC) architectures with deep memory hierarchies. Developing scheduling algorithms to reduce data access latency as well as energy consumption is a major challenge for such architectures. In this paper, we propose a set of algorithms to jointly address the problem of task scheduling and data allocation in a unified approach. Moreover, we present a feasible system model for NoC based multicores considering a three-level memory hierarchy that effectively captures the energy consumed by various elements of system including: processing cores, caches, and NoC subsystem. Simulation results show the superiority of proposed algorithms compared to two state-of-the-art algorithms found in the literature. The experimental results clearly indicate that algorithms performing data and task scheduling in a joint fashion are superior against techniques implementing task and data scheduling separately.

[1] José González,et al. Distributed Cooperative Caching: An Energy Efficient Memory Scheme for Chip Multiprocessors , 2012, IEEE Transactions on Parallel and Distributed Systems.

[2] Shaik Mahmed. A 16-Core Processor with Shared-Memory and Message-Passing Communications , 2015 .

[3] Hong He,et al. Task assignment in heterogeneous computing systems using an effective iterated greedy algorithm , 2011, J. Syst. Softw..

[4] Xu Cheng,et al. A 16-Core Processor With Shared-Memory and Message-Passing Communications , 2014, IEEE Transactions on Circuits and Systems I: Regular Papers.

[5] Wei-Che Tseng,et al. Data Allocation Optimization for Hybrid Scratch Pad Memory With SRAM and Nonvolatile Memory , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6] Valentin Deaconu. Directed Graphs , 2010, Encyclopedia of Machine Learning.

[7] Edwin Hsing-Mean Sha,et al. Efficient assignment and scheduling for heterogeneous DSP systems , 2005, IEEE Transactions on Parallel and Distributed Systems.

[8] N. Muralimanohar,et al. CACTI 6 . 0 : A Tool to Understand Large Caches , 2007 .

[9] Wayne H. Wolf,et al. TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[10] Vivek Sarkar,et al. Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement , 2009, LCPC.

[11] Quan Chen,et al. Adaptive Cache Aware Bitier Work-Stealing in Multisocket Multicore Architectures , 2013, IEEE Transactions on Parallel and Distributed Systems.

[12] Ajoy Kumar Datta,et al. CPU Scheduling for Power/Energy Management on Multicore Processors Using Cache Miss and Context Switch Data , 2014, IEEE Transactions on Parallel and Distributed Systems.

[13] Tulika Mitra,et al. Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.

[14] Karthick Rajamani,et al. Tiered Memory: An Iso-Power Memory Architecture to Address the Memory Power Wall , 2012, IEEE Transactions on Computers.

[15] Kenli Li,et al. Energy-Aware Data Allocation and Task Scheduling on Heterogeneous Multiprocessor Systems With Time Constraints , 2014, IEEE Transactions on Emerging Topics in Computing.

[16] César A. M. Marcon,et al. Partitioning and mapping on NoC-Based MPSoC: an energy consumption saving approach , 2011, NoCArc '11.

[17] Lothar Thiele,et al. Dynamic Power-Aware Mapping of Applications onto Heterogeneous MPSoC Platforms , 2010, IEEE Transactions on Industrial Informatics.

[18] Andrew A. Chien,et al. The future of microprocessors , 2011, Commun. ACM.

[19] Meikang Qiu,et al. Data Placement and Duplication for Embedded Multicore Systems With Scratch Pad Memory , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20] Wei Zhang,et al. Hybrid SPM-cache architectures to achieve high time predictability and performance , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[21] David Daly,et al. The cache and memory subsystems of the IBM POWER8 processor , 2015, IBM J. Res. Dev..

[22] Yi He,et al. Co-optimization of memory access and task scheduling on MPSoC architectures with multi-level memory , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).

[23] Nikil D. Dutt,et al. NoC-based fault-tolerant cache design in chip multiprocessors , 2014, ACM Trans. Embed. Comput. Syst..

[24] Jean-Luc Dekeyser,et al. Estimating Energy Consumption for an MPSoC Architectural Exploration , 2006, ARCS.

[25] Nikil D. Dutt,et al. HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed ScratchPad and Non-Volatile Memories , 2012, DAC Design Automation Conference 2012.

[26] Wei-Che Tseng,et al. Minimizing Access Cost for Multiple Types of Memory Units in Embedded Systems Through Data Allocation and Scheduling , 2012, IEEE Transactions on Signal Processing.

[27] Y.-K. Kwok,et al. Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[28] Nikil D. Dutt,et al. A novel NoC-based design for fault-tolerance of last-level caches in CMPs , 2012, CODES+ISSS '12.

[29] Hai Jin,et al. DAGMap: efficient and dependable scheduling of DAG workflow job in Grid , 2010, The Journal of Supercomputing.

[30] Naehyuck Chang,et al. System-Level Performance and Power Optimization for MPSoC , 2015, ACM Trans. Embed. Comput. Syst..

[31] Christoforos E. Kozyrakis,et al. Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[32] Mani Azimi,et al. Integration Challenges and Tradeoffs for Terascale Architectures , 2007 .

[33] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[34] Partha Pratim Pande,et al. Performance evaluation and design trade-offs for wireless network-on-chip architectures , 2012, JETC.