Hybrid scratchpad and cache memory management for energy-efficient parallel HEVC encoding

The next-generation video coding standard High Efficiency Video Coding (HEVC) provides better compression rates for high resolution videos compared with H.264, at the cost of significantly increased needs for computation power and memory bandwidth. Therefore, memory subsystem optimization is of paramount importance to support HEVC on resource and energy constrained embedded consumer electronics. In this paper, we present a hybrid on-chip memory architecture with both caches and scratchpad memories (SPMs) for parallel HEVC encoding. A run-time prediction algorithm is proposed to effectively identify the most-frequently accessed memory regions in the search window(s) for processing individual coding tree units (CTUs). Depending on their intra- and inter-core reuses, these regions are loaded into the private or shared SPMs for guaranteed on-chip memory accesses. On the other hand, a relatively small hardware-controlled cache is used for the rest of data accesses. Moreover, an adaptive power gating scheme is proposed to power off SPM sectors with expired load windows to further reduce the on-chip leakage power. Compared with the state-of-the-art solution, experimental results show that our proposed memory management framework supports high speed parallel HEVC processing with substantially smaller on-chip memory size, which achieves up to 76.23% on-chip leakage energy savings, and 33.31% energy saving for the overall memory subsystem.

[1]  Kevin J. Nowka,et al.  Enhanced Leakage Reduction Techniques Using Intermediate Strength Power Gating , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Heonshik Shin,et al.  Dynamic data scratchpad memory management for a memory subsystem with an MMU , 2007, LCTES '07.

[3]  Mohamed Abid,et al.  High level H.264/AVC video encoder parallelization for multiprocessor implementation , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[4]  Wei Zhang,et al.  Reducing cache leakage energy for hybrid SPM-cache architectures , 2014, 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[5]  Sergio Bampi,et al.  Energy-efficient memory hierarchy for Motion and Disparity Estimation in Multiview Video Coding , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  Sergio Bampi,et al.  Energy-efficient architecture for advanced video memory , 2014, 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[7]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[8]  Muhammad Usman Karim Khan,et al.  AMBER: Adaptive energy management for on-chip hybrid video memories , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[9]  Chung-Ping Young,et al.  CASA: Contention-Aware Scratchpad Memory Allocation for Online Hybrid On-Chip Memory Management , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Sergio Bampi,et al.  Run-time adaptive energy-aware Motion and Disparity Estimation in Multiview Video Coding , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  Luca Benini,et al.  Cycle-accurate simulation of energy consumption in embedded systems , 1999, DAC '99.

[12]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .

[13]  Ben H. H. Juurlink,et al.  Parallel Scalability and Efficiency of HEVC Parallelization Approaches , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Shiao-Li Tsao,et al.  Minimizing Energy Consumption of Embedded Systems via Optimal Code Layout , 2012, IEEE Transactions on Computers.

[15]  Sergio Bampi,et al.  dSVM: Energy-efficient distributed Scratchpad Video Memory Architecture for the next-generation High Efficiency Video Coding , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[16]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[17]  Minhua Zhou,et al.  An Overview of Tiles in HEVC , 2013, IEEE Journal of Selected Topics in Signal Processing.

[18]  Liang-Gee Chen,et al.  Level C+ data reuse scheme for motion estimation with corresponding coding orders , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  A. Navarro,et al.  Improvements to TZ search motion estimation algorithm for multiview video coding , 2012, 2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP).

[20]  Muhammad Usman Karim Khan,et al.  Software architecture of High Efficiency Video Coding for many-core systems with power-efficient workload balancing , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Alexander G. Dean,et al.  Leveraging both Data Cache and Scratchpad Memory through Synergetic Data Allocation , 2012, 2012 IEEE 18th Real Time and Embedded Technology and Applications Symposium.

[23]  Tihao Chiang,et al.  A Low-Power and Bandwidth-Efficient Motion Estimation IP Core Design Using Binary Search , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Sergio Bampi,et al.  A low-power memory architecture with application-aware power management for motion & disparity estimation in Multiview Video Coding , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).