论文信息 - Locality management using multiple SPMs on the Multi-Level Computing Architecture

Locality management using multiple SPMs on the Multi-Level Computing Architecture

The multi-level computing architecture (MLCA) is a novel system-on-chip architecture for embedded systems designed to exploit task-level and instruction-level parallelism in multimedia applications. The MLCA provides a unique two-level programming model that simplifies the development of embedded applications. To cope with increasing intra-system communication delays, we introduce a distributed memory version of the MLCA where separate storage is used for global and local application data. Global data is stored on multiple on-chip scratch-pad memories (SPMs) with non-uniform-memory access (NUMA) latencies, while local data is stored on PU-private memories. In such designs, one of the key factors affecting application performance is the locality of access to global data. We introduce programming constructs and run-time support to dynamically manage data stored in the SPMs and to influence run-time task scheduling. Collectively, our techniques improve performance by 6%-40%, compared to simple static memory management and scheduling approaches

Tarek S. Abdelrahman | Ahmed M. Abdelkhalek

[1] Martin Schulz,et al. ARS: an adaptive runtime system for locality optimization , 2003, Future Gener. Comput. Syst..

[2] Luca Benini,et al. Polynomial-time algorithm for on-chip scratchpad memory partitioning , 2003, CASES '03.

[3] Utku Aydonat,et al. COMPILER SUPPORT FOR A MULTIMEDIA SYSTEM-ON-CHIP ARCHITECTURE , 2005 .

[4] Luca Benini,et al. An integrated hardware/software approach for run-time scratchpad management , 2004, Proceedings. 41st Design Automation Conference, 2004..

[5] Tarek S. Abdelrahman,et al. Power optimizations for the MLCA using dynamic voltage scaling , 2005, SCOPES '05.

[6] Tarek S. Abdelrahman,et al. A multilevel computing architecture for embedded multimedia applications , 2004, IEEE Micro.

[7] Anoop Gupta,et al. Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.

[8] David K. Lowenthal,et al. An Integrated Compiler/Run-Time System for Global Data Distribution in Distributed Shared Memory Systems∗ , 2002 .

[9] Evangelos P. Markatos,et al. Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Supercomputing '92.

[10] Nikil D. Dutt,et al. Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[11] Mahmut T. Kandemir,et al. Exploiting shared scratch pad memory space in embedded multiprocessor systems , 2002, DAC '02.

[12] Peter Marwedel,et al. Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[13] Anoop Gupta,et al. The impact of operating system scheduling policies and synchronization methods of performance of parallel applications , 1991, SIGMETRICS '91.

[14] Rupert W. Ford,et al. Feedback Guided Scheduling of Nested Loops , 2000, PARA.

[15] Rajeev Barua,et al. Heterogeneous memory management for embedded systems , 2001, CASES '01.