Cache Performance Optimization for SoC Vedio Applications

Chip Multiprocessors (CMPs) are adopted by industry to deal with the speed limit of the single-processor. But memory access has become the bottleneck of the performance, especially in multimedia applications. In this paper, a set of management policies is proposed to improve the cache performance for a SoC platform of video application. By analyzing the behavior of Vedio Engine, the memory-friendly writeback and efficient prefetch policies are adopted. The experiment platform is simulated by System C with ARM Cotex-A9 processor model. Experimental study shows that the performance can be improved by the proposed mechanism in contrast to the general cache without Last Level Cache (LLC): up to 18.87% Hit Rate increased, 10.62% MM Latency and 46.43% CPU Read Latency decreased for VENC/16way/64bytes; up to 52.1% Hit Rate increased, 11.43% MM Latency and 47.48% CPU Read Latency decreased for VDEC/16way/64bytes, but with only 8.62% and 4.23% Bandwidth increased respectively

[1]  Víctor Viñals,et al.  ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache , 2012, TACO.

[2]  I. Kuroda,et al.  Multimedia processors , 1998, Proc. IEEE.

[3]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[4]  Wei-Fen Lin,et al.  Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[5]  Faye A. Briggs,et al.  A study of performance impact of memory controller features in multi-processor server environment , 2004, WMPI '04.

[6]  Mahmut T. Kandemir,et al.  Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[7]  Onur Mutlu,et al.  Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[8]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[9]  Calvin Lin,et al.  Adaptive History-Based Memory Schedulers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[10]  Onur Mutlu,et al.  Using the First-Level Caches as Filters to Reduce the Pollution Caused by Speculative Memory References , 2005, International Journal of Parallel Programming.

[11]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[12]  Parijat Dube,et al.  Performance modeling and characterization of large last level caches , 2012, ISPASS.

[13]  Ying Xu,et al.  Prediction in Dynamic SDRAM Controller Policies , 2009, SAMOS.

[14]  G. Tyson,et al.  Eager writeback-a technique for improving bandwidth utilization , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[15]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[16]  Calvin Lin,et al.  Memory Prefetching Using Adaptive Stream Detection , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[17]  Jun Shao,et al.  A Burst Scheduling Access Reordering Mechanism , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[18]  Sally A. McKee,et al.  Hardware-only stream prefetching and dynamic access ordering , 2000, ICS '00.

[19]  Jaehyuk Huh,et al.  Exploring the design space of future CMPs , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[20]  Fredrik Dahlgren,et al.  Multimedia in mobile phones — The ongoing revolution , 2004 .

[21]  Sally A. McKee,et al.  Dynamic Access Ordering for Streamed Computations , 2000, IEEE Trans. Computers.

[22]  Zhao Zhang,et al.  A performance comparison of DRAM memory system optimizations for SMT processors , 2005, 11th International Symposium on High-Performance Computer Architecture.

[23]  Zhe Wang,et al.  Rank idle time prediction driven last-level cache writeback , 2012, MSPC '12.

[24]  Lizy Kurian John,et al.  The virtual write queue: coordinating DRAM and last-level cache policies , 2010, ISCA.

[25]  David W. Nellans,et al.  Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.

[26]  Jichuan Chang,et al.  Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[27]  Koen De Bosschere,et al.  Introduction to the special issue on high-performance and embedded architectures and compilers , 2012, TACO.

[28]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[29]  Onur Mutlu,et al.  DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems , 2010 .

[30]  Rami G. Melhem,et al.  Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems , 2012, TACO.

[31]  Onur Mutlu,et al.  Prefetch-Aware DRAM Controllers , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[32]  Onur Mutlu,et al.  Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.