Densely connected convolutional network block based autoencoder for panorama map compression

Abstract As a novel virtual reality (VR) format, panorama maps are attracting increasing attention, while the compression of panorama images is still a concern. In this paper, a densely connected convolutional network block (dense block) based autoencoder is proposed to compress panorama maps. In the proposed autoencoder, dense blocks are specially designed to reuse feature maps and reduce redundancy of features. Meanwhile, a loss function, which imports a position-dependent weight item for each pixel, is proposed to train and adjust network parameters, in order to make the autoencoder fit to properties of panorama maps. Based on the proposed autoencoder and the weighted loss function, a greedy block-wise training scheme is also designed to avoid gradient vanishing problem and speed up training. During training process, the autoencoder is divided into several sub-nets. After each sub-net is trained separately, the whole network is fine-tuned to achieve the best performance. Experimental results demonstrate that the proposed autoencoder, compared with JPEG, saves up to 79.69 % bit rates, and obtains 7.27dB gain in BD-WS-PSNR or 0.0789 gain in BD-WS-SSIM. The proposed autoencoder also outperforms JPEG 2000, HEVC and VVC in both BD-WS-PSNR and BD-WS-SSIM. Meanwhile, subjective results show that the proposed autoencoder can recover details of panorama images, and reconstruct maps with high visual quality.

[1]  Cheng-Yuan Liou,et al.  Autoencoder for words , 2014, Neurocomputing.

[2]  M. Omair Ahmad,et al.  Mixed Gaussian-impulse noise reduction from images using convolutional neural network , 2018, Signal Process. Image Commun..

[3]  John L. Salmon,et al.  Capabilities of Current Generation Virtual Reality to Enhance the Design Process , 2017 .

[4]  Eliot Winer,et al.  A Virtual Reality Application for Additive Manufacturing Process Training , 2015 .

[5]  Daniele D. Giusto,et al.  Objective assessment of the WebP image coding algorithm , 2012, Signal Process. Image Commun..

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  S. J. Backman,et al.  Exploring the Implications of Virtual Reality Technology in Tourism Marketing: An Integrated Research Framework , 2016 .

[8]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[9]  Jeremy S. Smith,et al.  Hierarchical Multi-scale Attention Networks for action recognition , 2017, Signal Process. Image Commun..

[10]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yiming Li,et al.  Recent advances in omnidirectional video coding for virtual reality: Projection and evaluation , 2018, Signal Process..

[12]  Nikhil Ketkar,et al.  Deep Learning with Python , 2017 .

[13]  Luc Van Gool,et al.  Extreme Learned Image Compression with GANs , 2018, CVPR Workshops.

[14]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Takanori Suzuki,et al.  Super-resolution convolutional neural network for the improvement of the image quality of magnified images in chest radiographs , 2017, Medical Imaging.

[16]  Elnaz Jahani Heravi,et al.  Guide to Convolutional Neural Networks , 2017 .

[17]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Nikhil Ketkar,et al.  Convolutional Neural Networks , 2021, Deep Learning with Python.

[20]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[21]  Michela Ott,et al.  A LITERATURE REVIEW ON IMMERSIVE VIRTUAL REALITY IN EDUCATION: STATE OF THE ART AND PERSPECTIVES. , 2015, 11th International Conference eLearning and Software for Education.

[22]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Saraju P. Mohanty,et al.  Energy-Efficient Design of the Secure Better Portable Graphics Compression Architecture for Trusted Image Communication in the IoT , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[24]  Cheng-Yuan Liou,et al.  Modeling word perception using the Elman network , 2008, Neurocomputing.

[25]  Michael W. Marcellin,et al.  An overview of JPEG-2000 , 2000, Proceedings DCC 2000. Data Compression Conference.

[26]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Touradj Ebrahimi,et al.  The JPEG 2000 still image compression standard , 2001, IEEE Signal Process. Mag..

[28]  Elnaz Jahani Heravi,et al.  Convolutional Neural Networks , 2017 .

[29]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Matt Welsh,et al.  Flywheel: Google's Data Compression Proxy for the Mobile Web , 2015, NSDI.

[31]  Zhu Li,et al.  Projection based advanced motion model for cubic mapping for 360-degree video , 2017, 2017 IEEE International Conference on Image Processing (ICIP).