A framework for memory-aware multimedia application mapping on chip-multiprocessors

The relentless increase in multimedia embedded system application requirements as well as improvements in IC design technology have motivated the deployment of chip multiprocessor (CMP) architectures. Task scheduling and data placement in memory are two of the most important steps in the application customization process as they greatly influence overall power consumption, and performance. Most designers consider task scheduling and data placement to be independent of each other. However, optimal task scheduling does not always produce optimal data placement, and optimal data placement may not necessarily allow for optimal task scheduling. In this paper, we propose a novel framework for simultaneous application mapping and data placement onto CMP architectures, especially for multimedia applications. At the core of our framework is a memory-aware task scheduling algorithm that relies on static analysis and task splitting to reduce off-chip memory transfers. Our experiments on a JPEG2000 case study have shown that we can achieve up to 35% performance improvement and up to 66% power reduction compared to traditional scheduling/data allocation approaches.

[1]  C. V. Ramamoorthy,et al.  Optimal Scheduling Strategies in a Multiprocessor System , 1972, IEEE Transactions on Computers.

[2]  Mahmut T. Kandemir,et al.  Compiler-directed scratch pad memory hierarchy design and management , 2002, DAC '02.

[3]  Kurt Keutzer,et al.  Efficient Parallelization of H.264 Decoding with Macro Block Level Scheduling , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[4]  Krzysztof Kuchcinski,et al.  A constructive algorithm for memory-aware task assignment and scheduling , 2001, CODES '01.

[5]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[6]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[7]  Peter Marwedel,et al.  Data partitioning for maximal scratchpad usage , 2003, ASP-DAC '03.

[8]  K. Mani Chandy,et al.  A comparison of list schedules for parallel processing systems , 1974, Commun. ACM.

[9]  Erik Brockmeyer,et al.  Layer assignment techniques for low energy in multi-layered memory organisations , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[10]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  Erik Brockmeyer,et al.  Layer assignment techniques for low power in multi-layered memory organisations. , 2003 .

[12]  James C. Browne,et al.  General approach to mapping of parallel computations upon multiprocessor architectures , 1988 .

[13]  Mihalis Yannakakis,et al.  Scheduling Interval-Ordered Tasks , 1979, SIAM J. Comput..

[14]  T. C. Hu Parallel Sequencing and Assembly Line Problems , 1961 .

[15]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[16]  Shuming Chen,et al.  A Highly Efficient Parallel Algorithm for H.264 Encoder Based on Macro-Block Region Partition , 2007, HPCC.

[17]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[18]  Ishfaq Ahmad,et al.  Benchmarking and Comparison of the Task Graph Scheduling Algorithms , 1999, J. Parallel Distributed Comput..

[19]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[20]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[21]  Erik Brockmeyer,et al.  Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[22]  Erik Brockmeyer,et al.  Data reuse analysis technique for software-controlled memory hierarchies , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[23]  Tulika Mitra,et al.  Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.

[24]  Nikil D. Dutt,et al.  FORAY-GEN: automatic generation of affine functions for memory optimizations , 2005, Design, Automation and Test in Europe.