Viewport Forecasting in 360° Virtual Reality Videos with Machine Learning

Objective. Virtual reality (VR) cloud gaming and 360° video streaming are on the rise. With a VR headset, viewers can individually choose the perspective they see on the head-mounted display by turning their head, which creates the illusion of being in a virtual room. In this experimental study, we applied machine learning methods to anticipate future head rotations (a) from preceding head and eye motions, and (b) from the statistics of other spherical video viewers. Approach. Ten study participants watched each 3 1/3 hours of spherical video clips, while head and eye gaze motions were tracked, using a VR headset with a built-in eye tracker. Machine learning models were trained on the recorded head and gaze trajectories to predict (a) changes of head orientation and (b) the viewport from population statistics. Results. We assembled a dataset of head and gaze trajectories of spherical video viewers with great stimulus variability. We extracted statistical features from these time series and showed that a Support Vector Machine can classify the range of future head movements with a time horizon of up to one second with good accuracy. Even population statistics among only ten subjects show prediction success above chance level. %Both approaches resulted in a considerable amount of prediction success using head movements, but using gaze movement did not contribute to prediction performance in a meaningful way. Even basic machine learning models can successfully predict head movement and aspects thereof, while being naive to visual content. Significance. Viewport forecasting opens up various avenues to optimize VR rendering and transmission. While the viewer can see only a section of the surrounding 360° sphere, the entire panorama has typically to be rendered and/or broadcast. The reason is rooted in the transmission delay, which has to be taken into account in order to avoid simulator sickness due to motion-to-photon latencies. Knowing in advance, where the viewer is going to look at may help to make cloud rendering and video streaming of VR content more efficient and, ultimately, the VR experience more appealing.

[1]  Gwendal Simon,et al.  Trajectory-Based Viewport Prediction for 360-Degree Virtual Reality Videos , 2018, 2018 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR).

[2]  Zhimin Xu,et al.  360ProbDASH: Improving QoE of 360 Video Streaming Using Tile-based HTTP Adaptive Streaming , 2017, ACM Multimedia.

[3]  Xin Liu,et al.  Viewing 360 degree videos: Motion prediction and bandwidth optimization , 2016, 2016 IEEE 24th International Conference on Network Protocols (ICNP).

[4]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[5]  Konrad Tollmar,et al.  Gaze-Aware Streaming Solutions for the Next Generation of Mobile VR Experiences , 2018, IEEE Transactions on Visualization and Computer Graphics.

[6]  Cheng-Hsin Hsu,et al.  Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality , 2017, NOSSDAV.

[7]  Joohwan Kim,et al.  Perceptually-based foveated virtual reality , 2016, SIGGRAPH Emerging Technologies.

[8]  Xin Liu,et al.  Motion-Prediction-Based Multicast for 360-Degree Video Transmissions , 2017, 2017 14th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[9]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[10]  Ming-Yu Liu,et al.  Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Zhan Ma,et al.  Viewport Adaptation-Based Immersive Video Streaming: Perceptual Modeling and Applications , 2018, ArXiv.

[12]  Feng Qian,et al.  Optimizing 360 video delivery over cellular networks , 2016, ATC@MobiCom.

[13]  Gordon Wetzstein,et al.  Movie editing and cognitive event segmentation in virtual reality video , 2017, ACM Trans. Graph..

[14]  Frank Chongwoo Park,et al.  Smooth invariant interpolation of rotations , 1997, TOGS.

[15]  Ken Shoemake,et al.  Animating rotation with quaternion curves , 1985, SIGGRAPH.

[16]  Xin Liu,et al.  Shooting a moving target: Motion-prediction-based transmission for 360-degree videos , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[17]  Andreas Bulling,et al.  Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction , 2014, UbiComp Adjunct.

[18]  Shenghua Gao,et al.  Gaze Prediction in Dynamic 360° Immersive Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[20]  Feng Qian,et al.  A Rate Adaptation Algorithm for Tile-based 360-degree Video Streaming , 2017, ArXiv.

[21]  George Manis,et al.  Heartbeat Time Series Classification With Support Vector Machines , 2009, IEEE Transactions on Information Technology in Biomedicine.

[22]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..