BurstLink: Techniques for Energy-Efficient Video Display for Conventional and Virtual Reality Systems

Conventional planar video streaming is the most popular application in mobile systems. The rapid growth of 360° video content and virtual reality (VR) devices is accelerating the adoption of VR video streaming. Unfortunately, video streaming consumes significant system energy due to high power consumption of major system components (e.g., DRAM, display interfaces, and display panel) involved in the video streaming process. For example, in conventional planar video streaming, the video decoder (in the processor) decodes video frames and stores them in the DRAM main memory before the display controller (in the processor) transfers decoded frames from DRAM to the display panel. This system architecture causes large amount of data movement to/from DRAM as well as high DRAM bandwidth usage. As a result, DRAM by itself consumes more than 30% of the video streaming energy. We propose BurstLink, a novel system-level technique that improves the energy efficiency of planar and VR video streaming. BurstLink is based on two key ideas. First, BurstLink directly transfers a decoded video frame from the video decoder or the GPU to the display panel, completely bypassing the host DRAM. To this end, we extend the display panel with a double remote frame buffer (DRFB) instead of DRAM’s double frame buffer so that the system can directly update the DRFB with a new frame while updating the display panel’s pixels with the current frame stored in the DRFB. Second, BurstLink transfers a complete decoded frame to the display panel in a single burst, using the maximum bandwidth of modern display interfaces. Unlike conventional systems where the frame transfer rate is limited by the pixel-update throughput of the display panel, BurstLink can always take full advantage of the high bandwidth of modern display interfaces by decoupling the frame transfer from the pixel update as enabled by the DRFB. This direct and burst frame transfer of capability BurstLink significantly reduces energy consumption of video display by 1) reducing accesses to DRAM, 2) increasing system’s residency at idle power states, and 3) enabling temporal power gating of several system components after quickly transferring each frame into the DRFB. BurstLink can be easily implemented in modern mobile systems with minimal changes to the video display pipeline. We evaluate BurstLink using an analytical power model that we rigorously validate on an Intel Skylake mobile system. Our evaluation shows that BurstLink reduces system energy consumption for 4K planar and VR video streaming by 41% and 33%, respectively. BurstLink provides an even higher energy reduction in future video streaming systems with higher display resolutions and/or display refresh rates.

[1]  George R. Hayek,et al.  69.4: Invited Paper: Extending Battery Life of Ultrabook™ Through use of Panel Self Refresh Technology (PSR) , 2013 .

[2]  Hyuk-Jae Lee,et al.  An Efficient Pipelined Architecture for H.264/AVC Intra Frame Processing , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[3]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, SPIE Optics + Photonics.

[4]  Naehyuck Chang,et al.  A compressed frame buffer to reduce display power consumption in mobile systems , 2004, ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753).

[5]  Naehyuck Chang,et al.  Low-power color TFT LCD display for hand-held embedded systems , 2002, ISLPED '02.

[6]  Michael Zyda,et al.  From visual simulation to virtual reality to games , 2005, Computer.

[7]  Michael Gschwind,et al.  New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors , 2003, IBM J. Res. Dev..

[8]  Anup Das,et al.  The slowdown or race-to-idle question: Workload-aware energy optimization of SMT multicore platforms under process variation , 2016, DATE 2016.

[9]  P. Landman High-level power estimation , 1996, Proceedings of 1996 International Symposium on Low Power Electronics and Design.

[10]  Jun Wang,et al.  Application-Specific Performance-Aware Energy Optimization on Android Mobile Devices , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[11]  Stephen H. Gunther,et al.  Managing the Impact of Increasing Microprocessor Power Consumption , 2001 .

[12]  Rachata Ausavarungnirun,et al.  Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks , 2018, ASPLOS.

[13]  Avi Mendelson,et al.  Power Management of Modern Processors , 2018 .

[14]  Erwan Nogues,et al.  Low power HEVC software decoder for mobile devices , 2015, Journal of Real-Time Image Processing.

[15]  Mahmut T. Kandemir,et al.  Short-Circuiting Memory Traffic in Handheld Platforms , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[16]  Rachata Ausavarungnirun,et al.  RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Onur Mutlu,et al.  SysScale: Exploiting Multi-domain Dynamic Voltage and Frequency Scaling for Energy Efficient Mobile Processors , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[18]  Tobias Höllerer,et al.  Virtual and Augmented Reality , 2018, IEEE Computer Graphics and Applications.

[19]  Mahmut T. Kandemir,et al.  Domain knowledge based energy management in handhelds , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[20]  Marian Verhelst,et al.  A 32 nm SoC With Dual Core ATOM Processor and RF WiFi Transceiver , 2013, IEEE Journal of Solid-State Circuits.

[21]  Rubiks , 2018, Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services.

[22]  Onur Mutlu,et al.  Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[23]  Rainer Leupers,et al.  A modular simulation framework for architectural exploration of on-chip interconnection networks , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).

[24]  파틸 스리니바스,et al.  A functional fabric-based test controller for functional and structural test and debug , 2011 .

[25]  Brian Harmer,et al.  YouTube: Online Video and Participatory Culture , 2010 .

[26]  Mahmut T. Kandemir,et al.  Race-To-Sleep + Content Caching + Display Caching: A Recipe for Energy-efficient Video Streaming on Handhelds , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Onur Mutlu,et al.  MISE: Providing performance predictability and improving fairness in shared main memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[28]  Efraim Rotem,et al.  Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake , 2017, IEEE Micro.

[29]  Onur Mutlu,et al.  IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern Processors , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[30]  Feng Li,et al.  Rubiks: Practical 360-Degree Streaming for Smartphones , 2018, MobiSys.

[31]  Onur Mutlu,et al.  Memory scaling: A systems architecture perspective , 2013, 2013 5th IEEE International Memory Workshop.

[32]  P. Ameigeiras,et al.  Analysis and modeling of YouTube traffic , 2012 .

[33]  Christian Timmerer,et al.  Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP: Design, Implementation, and Evaluation , 2017, MMSys.

[34]  Avi Mendelson,et al.  A Metric-Guided Method for Discovering Impactful Features and Architectural Insights for Skylake-Based Processors , 2019, ACM Trans. Archit. Code Optim..

[35]  Avi Mendelson,et al.  FlexWatts: A Power- and Workload-Aware Hybrid Power Delivery Network for Energy-Efficient Microprocessors , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[36]  Michael Werner,et al.  Wake-up latencies for processor idle states on current x86 processors , 2014, Computer Science - Research and Development.

[37]  Pat Conway,et al.  The AMD Opteron Northbridge Architecture , 2007, IEEE Micro.

[38]  Efraim Rotem,et al.  BurstLink: Techniques for Energy-Efficient Conventional and Virtual Reality Video Display , 2021, ArXiv.

[39]  Man-Jae Kim,et al.  Processor-Experience the Ultimate Performance and Versatility , 2013 .

[40]  Kai Cheng,et al.  The Blackford Northbridge Chipset for the Intel 5000 , 2007, IEEE Micro.

[41]  Yuhao Zhu,et al.  Energy-Efficient Video Processing for Virtual Reality , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[42]  Eduard Ayguadé,et al.  A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs , 2013, IEEE Transactions on Computers.

[43]  M.H. Khan,et al.  Bandwidth-efficient Display Controller for Low Power Devices in Presence of Occlusion , 2007, 2007 Digest of Technical Papers International Conference on Consumer Electronics.

[44]  Corey Gough,et al.  Energy Efficient Servers: Blueprints for Data Center Optimization , 2015 .

[45]  Mahmut T. Kandemir,et al.  Déjà View: Spatio-Temporal Compute Reuse for‘ Energy-Efficient 360° VR Video Streaming , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[46]  Carole-Jean Wu,et al.  Improving smartphone user experience by balancing performance and energy with probabilistic QoS guarantee , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[47]  Radu Marculescu,et al.  An Analytical Approach for Network-on-Chip Performance Analysis , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[48]  Debargha Mukherjee,et al.  The latest open-source video codec VP9 - An overview and preliminary results , 2013, 2013 Picture Coding Symposium (PCS).

[49]  Won-Jun Choe,et al.  16‐2: Cost‐effective Driver IC Architecture using Low‐power Memory Interface for Mobile Display Application , 2017 .

[50]  Onur Mutlu,et al.  What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study , 2018, SIGMETRICS.

[51]  Roger Jianxin Jiao,et al.  Generic Bill-of-Materials-and-Operations for High-Variety Production Management , 2000, Concurr. Eng. Res. Appl..

[52]  Corey Gough,et al.  CPU Power Management , 2015 .

[53]  Marcelo Yuffe,et al.  4.1 14nm 6th-generation Core processor SoC with low power consumption and improved performance , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[54]  Surface研究会 Microsoft Surface Proオーナーズブック , 2013 .

[55]  Jongmoo Choi,et al.  Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[56]  Chris Fallin,et al.  Memory power management via dynamic voltage/frequency scaling , 2011, ICAC '11.

[57]  Liang-Gee Chen,et al.  Analysis and design of macroblock pipelining for H.264/AVC VLSI architecture , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[58]  Rumi Zahir,et al.  The Medfield Smartphone: Intel Architecture in a Handheld Form Factor , 2013, IEEE Micro.

[59]  Tang-Hsun Tu,et al.  Batch-Pipelining for H.264 Decoding on Multicore Systems , 2010, 2010 Data Compression Conference.

[60]  Gwendal Simon,et al.  360-Degree Video Head Movement Dataset , 2017, MMSys.

[61]  Avi Mendelson,et al.  A Comprehensive Evaluation of Power Delivery Schemes for Modern Microprocessors , 2019, 20th International Symposium on Quality Electronic Design (ISQED).

[62]  Mahmut T. Kandemir,et al.  VIP: Virtualizing IP chains on handheld platforms , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[63]  Onur Mutlu,et al.  Voltron: Understanding and Exploiting the Voltage-Latency-Reliability Trade-Offs in Modern DRAM Chips to Improve Energy Efficiency , 2018, ArXiv.

[64]  Yong Liu,et al.  View direction and bandwidth adaptive 360 degree video streaming using a two-tier system , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[65]  Jason Nieh,et al.  The Performance of Remote Display Mechanisms for Thin-Client Computing , 2002, USENIX Annual Technical Conference, General Track.

[66]  Geoff V. Merrett,et al.  The slowdown or race-to-idle question: Workload-aware energy optimization of SMT multicore platforms under process variation , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[67]  Marino Menozzi,et al.  CRT versus LCD: Effects of refresh rate, display technology and background luminance in visual performance. , 2001 .

[68]  T. Takizawa,et al.  Compression/decompression DRAM for unified memory systems: a 16 Mb, 200 MHz, 90% to 50% graphics-bandwidth reduction prototype , 1998, 1998 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, ISSCC. First Edition (Cat. No.98CH36156).

[69]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[70]  Wei Chen,et al.  SkyLake-SP: A 14nm 28-Core xeon® processor , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[71]  Efraim Rotem,et al.  Energy Aware Race to Halt: A Down to EARtH Approach for Platform Energy Management , 2014, IEEE Computer Architecture Letters.