论文信息 - Cache Performance Optimization for SoC Vedio Applications

Cache Performance Optimization for SoC Vedio Applications

Chip Multiprocessors (CMPs) are adopted by industry to deal with the speed limit of the single-processor. But memory access has become the bottleneck of the performance, especially in multimedia applications. In this paper, a set of management policies is proposed to improve the cache performance for a SoC platform of video application. By analyzing the behavior of Vedio Engine, the memory-friendly writeback and efficient prefetch policies are adopted. The experiment platform is simulated by System C with ARM Cotex-A9 processor model. Experimental study shows that the performance can be improved by the proposed mechanism in contrast to the general cache without Last Level Cache (LLC): up to 18.87% Hit Rate increased, 10.62% MM Latency and 46.43% CPU Read Latency decreased for VENC/16way/64bytes; up to 52.1% Hit Rate increased, 11.43% MM Latency and 47.48% CPU Read Latency decreased for VDEC/16way/64bytes, but with only 8.62% and 4.23% Bandwidth increased respectively

Wei Zhang | Xing Zhang | Lei Li | HuiYao An | HuaiQi Zhu

[1] Víctor Viñals,et al. ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache , 2012, TACO.

[2] I. Kuroda,et al. Multimedia processors , 1998, Proc. IEEE.

[3] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[4] Wei-Fen Lin,et al. Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[5] Faye A. Briggs,et al. A study of performance impact of memory controller features in multi-processor server environment , 2004, WMPI '04.

[6] Mahmut T. Kandemir,et al. Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[7] Onur Mutlu,et al. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[8] Onur Mutlu,et al. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[9] Calvin Lin,et al. Adaptive History-Based Memory Schedulers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[10] Onur Mutlu,et al. Using the First-Level Caches as Filters to Reduce the Pollution Caused by Speculative Memory References , 2005, International Journal of Parallel Programming.

[11] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.

[12] Parijat Dube,et al. Performance modeling and characterization of large last level caches , 2012, ISPASS.

[13] Ying Xu,et al. Prediction in Dynamic SDRAM Controller Policies , 2009, SAMOS.

[14] G. Tyson,et al. Eager writeback-a technique for improving bandwidth utilization , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[15] James E. Smith,et al. Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[16] Calvin Lin,et al. Memory Prefetching Using Adaptive Stream Detection , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[17] Jun Shao,et al. A Burst Scheduling Access Reordering Mechanism , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[18] Sally A. McKee,et al. Hardware-only stream prefetching and dynamic access ordering , 2000, ICS '00.

[19] Jaehyuk Huh,et al. Exploring the design space of future CMPs , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[20] Fredrik Dahlgren,et al. Multimedia in mobile phones — The ongoing revolution , 2004 .

[21] Sally A. McKee,et al. Dynamic Access Ordering for Streamed Computations , 2000, IEEE Trans. Computers.

[22] Zhao Zhang,et al. A performance comparison of DRAM memory system optimizations for SMT processors , 2005, 11th International Symposium on High-Performance Computer Architecture.

[23] Zhe Wang,et al. Rank idle time prediction driven last-level cache writeback , 2012, MSPC '12.

[24] Lizy Kurian John,et al. The virtual write queue: coordinating DRAM and last-level cache policies , 2010, ISCA.

[25] David W. Nellans,et al. Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.

[26] Jichuan Chang,et al. Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[27] Koen De Bosschere,et al. Introduction to the special issue on high-performance and embedded architectures and compilers , 2012, TACO.

[28] Onur Mutlu,et al. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[29] Onur Mutlu,et al. DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems , 2010 .

[30] Rami G. Melhem,et al. Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems , 2012, TACO.

[31] Onur Mutlu,et al. Prefetch-Aware DRAM Controllers , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[32] Onur Mutlu,et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.