LadderNet: Knowledge Transfer Based Viewpoint Prediction in 360◦ Video

In the past few years, virtual reality (VR) has become an enabling technique, not only for enriching our visual experience but also for providing new channels for businesses. Untethered mobile devices are the main players for watching 360-degree content, thereby the precision of predicting the future viewpoints is one key challenge to improve the quality of the playbacks. In this paper, we investigate the image features of the 360-degree videos and the contextual information of the viewpoint trajectories. Specifically, we design ladder convolution to adapt for the distorted image, and propose LadderNet to transfer the knowledge from the pre-trained model and retrieve the features from the distorted image. We then combine the image features and the contextual viewpoints as the inputs for long short-term memory (LSTM) to predict the future viewpoints. Our approach is compared with several state-of-the-art viewpoint prediction algorithms over two 360-degree video datasets. Results show that our approach can improve the Intersection over Union (IoU) by at least 5% and meeting the requirements of the playback of 360-degree video on mobile devices.

[1]  Zulin Wang,et al.  Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Shiqiang Yang,et al.  A Dataset for Exploring User Behaviors in VR Spherical Video Streaming , 2017, MMSys.

[3]  Shenghua Gao,et al.  Saliency Detection in 360 ^\circ ∘ Videos , 2018, ECCV.

[4]  Yujie Li,et al.  A Sparse Coding Framework for Gaze Prediction in Egocentric Video , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Shenghua Gao,et al.  Saliency Detection in 360 ◦ Videos , 2022 .

[6]  Jianle Chen,et al.  Overview of SHVC: Scalable Extensions of the High Efficiency Video Coding Standard , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Ming-Yu Liu,et al.  Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Antoine Coutrot,et al.  A dataset of head and eye movements for 360° videos , 2018, MMSys.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Andreas Geiger,et al.  SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images , 2018, ECCV.

[11]  Kristen Grauman,et al.  Flat2Sphere: Learning Spherical Convolution for Fast Features from 360° Imagery , 2017, NIPS 2017.

[12]  Feng Qian,et al.  Flare: Practical Viewport-Adaptive 360-Degree Video Streaming for Mobile Devices , 2018, MobiCom.

[13]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Meihui Zhang,et al.  Cross-Domain Image Retrieval with Attention Modeling , 2017, ACM Multimedia.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Zongming Guo,et al.  CLS: A Cross-user Learning based System for Improving QoE in 360-degree Video Adaptive Streaming , 2018, ACM Multimedia.

[18]  Cheng-Hsin Hsu,et al.  Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality , 2017, NOSSDAV.

[19]  Feng Li,et al.  Rubiks: Practical 360-Degree Streaming for Smartphones , 2018, MobiSys.

[20]  Yao Liu,et al.  ClusTile: Toward Minimizing Bandwidth in 360-degree Video Streaming , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.