MASIC: Deep Mask Stereo Image Compression

Stereo image compression (SIC) aims to simultaneously compress a pair of left and right stereoscopic images, which can achieve higher compression efficiency than single image compression. In this paper, to benefit the SIC tasks, we collect a large real-world stereo image dataset, namely Palace, which is composed of hundreds of stereo image pairs at high-resolution. More importantly, we propose a novel mask stereo image compression network, namely MASIC, which can jointly compress the stereo images with high compression efficiency. Specifically, we first estimate the homography matrix between the stereo images through a regression model. Then, the left image is spatially transformed by the homography matrix, so that only the residual information needs to be encoded for the right image. To avoid the wrong guidance between stereo image pair, we propose a mask prediction module (MPM) to generate a multi-channel guided mask to navigate both the encoding and decoding processes. Based on the guided mask, we introduce a new mask conditional stereo entropy (MCSE) model, to fully explore the correlation between the stereo images in entropy coding. In the decoder, we develop a stereo decoding module to simultaneously decode the stereo images and enhance their compression quality. Experimental results show that our MASIC significantly advances the performance of SIC both quantitatively and qualitatively on a variety of datasets, and is robust to the change of parallax level between stereo images. The software codes are available at https://github.com/eecoder-dyf/MASIC.

[1]  J. Kotera,et al.  SASIC: Stereo Image Compression with Latent Shifts and Stereo Attention , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Chengdong Wu,et al.  Accurate and Efficient Stereo Matching by Log-Angle and Pyramid-Tree , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Gary J. Sullivan,et al.  Overview of the Versatile Video Coding (VVC) Standard and its Applications , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Chao Li,et al.  L3C-Stereo: Lossless Compression for Stereo Images , 2021, ArXiv.

[5]  Radu Timofte,et al.  Deep Homography for Efficient Stereo Image Compression , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Rajat Kumar Singh,et al.  Wavelet-Based Deep Auto Encoder-Decoder (WDAED)-Based Image Compression , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Zhibo Chen,et al.  Learned Block-Based Hybrid Image Compression , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Houqiang Li,et al.  End-to-End Optimized Versatile Image Compression With Wavelet-Like Transform , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Yuhua Xu,et al.  InStereo2K: a large real dataset for stereo matching in indoor scenes , 2020, Science China Information Sciences.

[10]  Tie-Yan Liu,et al.  Modeling Lost Information in Lossy Image Compression , 2020, ArXiv.

[11]  Zhan Ma,et al.  Learning End-to-End Lossy Image Compression: A Benchmark , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Xiaoming Tao,et al.  Toward Variable-Rate Generative Compression by Reducing the Channel Redundancy , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Yue Wang,et al.  3D LiDAR Map Compression for Efficient Localization on Resource Constrained Vehicles , 2020, IEEE Transactions on Intelligent Transportation Systems.

[14]  Shuaicheng Liu,et al.  Content-Aware Unsupervised Deep Homography Estimation , 2019, ECCV.

[15]  Rui Hu,et al.  DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Raquel Urtasun,et al.  DSIC: Deep Stereo Image Compression , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Xinfeng Zhang,et al.  Image and Video Compression With Neural Networks: A Review , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Xiaogang Wang,et al.  Group-Wise Correlation Stereo Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Saadi Boudjit,et al.  Sparse optimization of non separable vector lifting scheme for stereo image coding , 2018, J. Vis. Commun. Image Represent..

[20]  Jooyoung Lee,et al.  Context-adaptive Entropy Model for End-to-end Optimized Image Compression , 2018, ICLR.

[21]  Zhidong Deng,et al.  SegStereo: Exploiting Semantic Information for Disparity Estimation , 2018, ECCV.

[22]  Shahram Izadi,et al.  StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction , 2018, ECCV.

[23]  Aysha Kadaikar,et al.  Joint disparity and variable size-block optimization algorithm for stereoscopic image compression , 2018, Signal Process. Image Commun..

[24]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[25]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Vijay Kumar,et al.  Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model , 2017, IEEE Robotics and Automation Letters.

[27]  Qiong Yan,et al.  Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[28]  Mu Li,et al.  Learning Convolutional Networks for Content-Weighted Image Compression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[31]  Cheolkon Jung,et al.  Reliability-Based Discontinuity-Preserving Stereo Matching , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Cheng Zhang,et al.  Accurate Image-Guided Stereo Matching With Efficient Matching Cost and Disparity Refinement , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[35]  Tomasz Malisiewicz,et al.  Deep Image Homography Estimation , 2016, ArXiv.

[36]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Liang Wang,et al.  A Deep Visual Correspondence Embedding Model for Stereo Matching Costs , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[39]  Max Jaderberg,et al.  Spatial Transformer Networks , 2015, NIPS.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  B. Krishna Mohan,et al.  Image Matching Using SIFT Features and Relaxation Labeling Technique—A Constraint Initializing Method for Dense Stereo Matching , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[42]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[43]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[44]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[45]  Gary J. Sullivan,et al.  Overview of the Stereo and Multiview Video Coding Extensions of the H.264/MPEG-4 AVC Standard , 2011, Proceedings of the IEEE.

[46]  Kyoung Mu Lee,et al.  Mutual information-based stereo matching combined with SIFT descriptor in log-chromaticity color space , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Alan F. Smeaton,et al.  A Framework for Evaluating Stereo-Based Pedestrian Detection Techniques , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[48]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[49]  Christoph Fehn,et al.  Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV , 2004, IS&T/SPIE Electronic Imaging.

[50]  J. N. Ellinas,et al.  Stereo image compression using wavelet coefficients morphology , 2004, Image Vis. Comput..

[51]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[52]  Kenneth Zeger,et al.  Residual image coding for stereo image compression , 2002, Proceedings. International Conference on Image Processing.

[53]  Michael G. Strintzis,et al.  A family of wavelet-based stereo image coders , 2002, IEEE Trans. Circuits Syst. Video Technol..

[54]  Touradj Ebrahimi,et al.  The JPEG 2000 still image compression standard , 2001, IEEE Signal Process. Mag..

[55]  Vivek K Goyal Theoretical foundations of transform coding , 2001, IEEE Signal Process. Mag..

[56]  Mark W. Maier,et al.  DCT transform coding of stereo images for multimedia applications , 1998, IEEE Trans. Ind. Electron..

[57]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[58]  Ying Chen,et al.  Overview of the Multiview and 3D Extensions of High Efficiency Video Coding , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[59]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[60]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[61]  Mahmood R. Azimi-Sadjadi,et al.  A 2-D filtering scheme for stereo image compression using sequential orthogonal subspace updating , 2001, IEEE Trans. Circuits Syst. Video Technol..

[62]  Glen G. Langdon,et al.  Arithmetic Coding , 1979 .