Content-Aware Cubemap Projection for Panoramic Image via Deep Q-Learning

Cubemap projection (CMP) becomes a potential panoramic data format for its efficiency. However, default CMP coordinate system with fixed viewpoint may cause distortion, especially around the boundaries of each projection plane. To promote quality of panoramic images in CMP, we propose a content-awared CMP optimization method via deep Q-learning. The key of this method is to predict an angle for rotating the image in Equirectangular projection (ERP), which attempts to keep foreground objects away from the edge of each projection plane after the image is re-projected with CMP. Firstly, the panoramic image in ERP is preprocessed for obtaining a foreground pixel map. Secondly, we feed the foreground map into the proposed deep convolutional network (ConvNet) to obtain the predicted rotation angle. The model parameters are training through the deep Q-learning scheme. Experimental results show our method keep more foreground pixels in center of each projection plane than the baseline.

[1]  Zulin Wang,et al.  Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Shing-Chow Chan,et al.  Data compression and transmission aspects of panoramic videos , 2005 .

[3]  Kristen Grauman,et al.  Making 360° Video Watchable in 2D: Learning Videography for Click Free Viewing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5]  Jukka Corander,et al.  PANINI: Pangenome Neighbour Identification for Bacterial Populations , 2018, Microbial genomics.

[6]  Kristen Grauman,et al.  Kernel Transformer Networks for Compact Spherical Convolution , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Kristen Grauman,et al.  Pano2Vid: Automatic Cinematography for Watching 360° Videos , 2016, ACCV.

[9]  Kristen Grauman,et al.  Flat2Sphere: Learning Spherical Convolution for Fast Features from 360° Imagery , 2017, NIPS 2017.

[10]  Aljoscha Smolic,et al.  Efficient representation and interactive streaming of high-resolution panoramic views , 2002, Proceedings. International Conference on Image Processing.

[11]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[12]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[13]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[14]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[15]  Kristen Grauman,et al.  Snap Angle Prediction for 360 ∘ Panoramas , 2018, ECCV.

[16]  Kuk-Jin Yoon,et al.  Automatic Content-Aware Projection for 360° Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Kristen Grauman,et al.  Making 360$^{\circ}$ Video Watchable in 2D: Learning Videography for Click Free Viewing , 2017 .

[18]  G. Uhlenbeck,et al.  On the Theory of the Brownian Motion , 1930 .

[19]  Kristen Grauman,et al.  Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Meenakshisundaram Gopi,et al.  Correcting perceived perspective distortions using object specific planar transformations , 2016, 2016 IEEE International Conference on Computational Photography (ICCP).

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.