Multi-scale network (MsSG-CNN) for joint image and saliency map learning-based compression

Abstract Lossy image compression is likely to produce blurred images which leads to erroneous image-level understanding. The human visual system (HVS) is focused on the area of interest present in the image. Motivated by this fact, we propose a compression-decompression algorithm focusing on the important content present in different parts of the image. Using the concept of the Guided Grad-CAM (Gradient-weighted Class Activation Mapping) technique to produce heat maps with the help of a heat map generator trained on ResNet-50, a saliency-guided encoding–decoding algorithm is developed. A wider multi-scale saliency guided convolutional neural network (MsSG-CNN) is designed, in which the notion of convolution with different size filters helps to obtain unique but different features. The feature extraction followed by multilevel fusion of features helps deep neural network (DNN) to capture contextual information and obtain high quality, good resolution, visually pleasing images with fine details. The proposed algorithm is tested on the Kodak benchmark dataset, CLIC 2019 challenging dataset, and FDDB facial images dataset. At low bit rates, the MS-SSIM of the proposed algorithm is found to be superior to JPEG, JPEG2000, BPG, WebP, and Minnen’s approaches with approximately up to 60%, 24.80%, 11.43%, 23.08% & 75% gains respectively, which is quite a significant improvement, when tested on the Kodak dataset. Similarly, at high bit rates, the improvement in MS-SSIM is approximately up to 41.67%, 37.30%, 23.90% 34.21% & 13.33% when compared with JPEG, JPEG2000, BPG, WebP, and Minnen’s approaches respectively. The improvement in PSNR at low and high bit rates is approximately up to 11.32%, 5%, 5.26% and 5.6%, 10.29%, 8.7% as compared to JPEG, Balle’s, and Lee’s algorithms respectively. The PSNR-HVS has been improved by approximately up to 27.27%, 19.15%, and 28.33%, 28% as compared to JPEG and Toderici’s algorithms respectively at low and high bit-rates. A similar type of improvement is obtained with FDDB and CLIC 2019 datasets also, which is discussed in the paper.

[1]  Weidong Wang,et al.  An End-to-End Deep Learning Image Compression Framework Based on Semantic Analysis , 2019, Applied Sciences.

[2]  Li Chen,et al.  End-to-End Optimized ROI Image Compression , 2019, IEEE Transactions on Image Processing.

[3]  Luc Van Gool,et al.  Conditional Probability Models for Deep Image Compression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  James R. Thompson Some Shrinkage Techniques for Estimating the Mean , 1968 .

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[7]  David Minnen,et al.  Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[9]  Rajat Kumar Singh,et al.  Wavelet-Based Deep Auto Encoder-Decoder (WDAED)-Based Image Compression , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Zhan Ma,et al.  Variable Bitrate Image Compression with Quality Scaling Factors , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Joost van de Weijer,et al.  Variable Rate Deep Image Compression With Modulated Autoencoder , 2019, IEEE Signal Processing Letters.

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  Lei Zhang,et al.  Learning a Single Tucker Decomposition Network for Lossy Image Compression With Multiple Bits-per-Pixel Rates , 2018, IEEE Transactions on Image Processing.

[14]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).