Perceptual Compression for Video Storage and Processing Systems

Compressed videos constitute 70% of Internet traffic, and video upload growth rates far outpace compute and storage improvement trends. Past work in leveraging perceptual cues like saliency, i.e., regions where viewers focus their perceptual attention, reduces compressed video size while maintaining perceptual quality, but requires significant changes to video codecs and ignores the data management of this perceptual information. In this paper, we propose Vignette, a compression technique and storage manager for perception-based video compression in the cloud. Vignette complements off-the-shelf compression software and hardware codec implementations. Vignette's compression technique uses a neural network to predict saliency information used during transcoding, and its storage manager integrates perceptual information into the video storage system. Our results demonstrate the benefit of embedding information about the human visual system into the architecture of cloud video storage systems.

[1]  Gerhard Tröster,et al.  Wearable EOG goggles: Seamless sensing and context-awareness in everyday environments , 2009, J. Ambient Intell. Smart Environ..

[2]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[3]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[4]  Yongyi Yang,et al.  Optical Flow Estimation for a Periodic Image Sequence , 2010, IEEE Transactions on Image Processing.

[5]  Laurent Itti,et al.  Visual attention guided bit allocation in video compression , 2011, Image Vis. Comput..

[6]  Touradj Ebrahimi,et al.  Perceptual Video Compression: A Survey , 2012, IEEE Journal of Selected Topics in Signal Processing.

[7]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Minhua Zhou,et al.  An Overview of Tiles in HEVC , 2013, IEEE Journal of Selected Topics in Signal Processing.

[9]  Yael Pritch,et al.  Content-aware compression using saliency-driven image retargeting , 2013, 2013 IEEE International Conference on Image Processing.

[10]  Santanu Chaudhury,et al.  Visual saliency guided video compression algorithm , 2013, Signal Process. Image Commun..

[11]  Ivan Laptev,et al.  Efficient Feature Extraction, Encoding, and Classification for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[13]  Ivan V. Bajic,et al.  Saliency-Aware Video Compression , 2014, IEEE Transactions on Image Processing.

[14]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Shwetak N. Patel,et al.  EyeContact: scleral coil eye tracking for virtual reality , 2016, SEMWEB.

[16]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[17]  Anne Aaron,et al.  A large-scale video codec comparison of x264, x265 and libvpx for practical VOD applications , 2016, Optical Engineering + Applications.

[18]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[19]  Frédo Durand,et al.  Where Should Saliency Models Look Next? , 2016, ECCV.

[20]  L. V. Gutierrez,et al.  ASIC Clouds: Specializing the Datacenter , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[21]  Aakanksha Chowdhery,et al.  Optasia: A Relational Platform for Efficient Large-Scale Video Analytics , 2016, SoCC.

[22]  Peng Liu,et al.  Greening the Video Transcoding Service with Low-Cost Hardware Transcoders , 2016, USENIX Annual Technical Conference.

[23]  Gregory J. Zelinsky,et al.  Design and evaluation of a foveated video streaming service for commodity client devices , 2016, MMSys.

[24]  Kai Li,et al.  Popularity Prediction of Facebook Videos for Higher Quality Streaming , 2017, USENIX Annual Technical Conference.

[25]  Seungyeop Han,et al.  Fast Video Classification via Adaptive Cascading of Deep Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Qi Huang,et al.  SVE: Distributed Video Processing at Facebook Scale , 2017, SOSP.

[27]  Frédo Durand,et al.  Learning Visual Importance for Graphic Designs and Data Visualizations , 2017, UIST.

[28]  Paramvir Bahl,et al.  Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[29]  Alvin Cheung,et al.  VisualCloud Demonstration: A DBMS for Virtual Reality , 2017, SIGMOD Conference.

[30]  Cheng-Hsin Hsu,et al.  Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality , 2017, NOSSDAV.

[31]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Dmitriy Vatolin,et al.  A semiautomatic saliency model and its application to video compression , 2017, 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP).

[33]  Mahmut T. Kandemir,et al.  Race-To-Sleep + Content Caching + Display Caching: A Recipe for Energy-efficient Video Streaming on Handhelds , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[34]  Peter Bailis,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[35]  Henrique S. Malvar,et al.  Approximate Storage of Compressed and Encrypted Videos , 2017, ASPLOS.

[36]  Cheng-Hsin Hsu,et al.  360° Video Viewing Dataset in Head-Mounted Virtual Reality , 2017, MMSys.

[37]  Anirudh Sivaraman,et al.  Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads , 2017, NSDI.

[38]  G. Voelker,et al.  Sprocket , 2018, Proceedings of the ACM Symposium on Cloud Computing.

[39]  Ion Stoica,et al.  Chameleon: scalable adaptation of video analytics , 2018, SIGCOMM.

[40]  Alvin Cheung,et al.  LightDB: A DBMS for Virtual Reality Video , 2018, Proc. VLDB Endow..

[41]  Jian Huang,et al.  Semantic-Aware Virtual Reality Video Streaming , 2018, APSys.

[42]  Keith Winstein,et al.  Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol , 2018, NSDI.

[43]  Peter Bailis,et al.  BlazeIt: Fast Exploratory Video Queries using Neural Networks , 2018, ArXiv.

[44]  Paramvir Bahl,et al.  Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.

[45]  Geoffrey M. Voelker,et al.  Sprocket: A Serverless Video Processing Framework , 2018, SoCC.

[46]  Parthasarathy Ranganathan,et al.  vbench: Benchmarking Video Transcoding in the Cloud , 2018, ASPLOS.

[47]  Gordon Wetzstein,et al.  Saliency in VR: How Do People Explore Virtual Environments? , 2016, IEEE Transactions on Visualization and Computer Graphics.

[48]  Pat Hanrahan,et al.  Scanner: Efficient Video Analysis at Scale , 2018, ACM Trans. Graph..

[49]  R. Fontana,et al.  Moore’s law realities for recording systems and memory storage components: HDD, tape, NAND, and optical , 2018 .

[50]  Chameleon , 2018, Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication.

[51]  Yuhao Zhu,et al.  Energy-Efficient Video Processing for Virtual Reality , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[52]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.