Attention-Guided Hierarchical Structure Aggregation for Image Matting

Existing deep learning based matting algorithms primarily resort to high-level semantic features to improve the overall structure of alpha mattes. However, we argue that advanced semantics extracted from CNNs contribute unequally for alpha perception and we are supposed to reconcile advanced semantic information with low-level appearance cues to refine the foreground details. In this paper, we propose an end-to-end Hierarchical Attention Matting Network (HAttMatting), which can predict the better structure of alpha mattes from single RGB images without additional input. Specifically, we employ spatial and channel-wise attention to integrate appearance cues and pyramidal features in a novel fashion. This blended attention mechanism can perceive alpha mattes from refined boundaries and adaptive semantics. We also introduce a hybrid loss function fusing Structural SIMilarity (SSIM), Mean Square Error (MSE) and Adversarial loss to guide the network to further improve the overall foreground structure. Besides, we construct a large-scale image matting dataset comprised of 59,600 training images and 1000 test images (total 646 distinct foreground alpha mattes), which can further improve the robustness of our hierarchical structure aggregation model. Extensive experiments demonstrate that the proposed HAttMatting can capture sophisticated foreground structure and achieve state-of-the-art performance with single RGB images as input.

[1]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2]  Chi-Keung Tang,et al.  KNN Matting , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jingwei Tang,et al.  Learning-Based Sampling for Natural Image Matting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Feng Liu,et al.  Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Aykut Erdem,et al.  Image Matting with KL-Divergence Based Sparse Sampling , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Xiaohui Liang,et al.  A Cluster Sampling Method for Image Matting via Sparse Coding , 2016, ECCV.

[7]  In-So Kweon,et al.  Natural Image Matting Using Deep Convolutional Neural Networks , 2016, ECCV.

[8]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[9]  Yuanjie Zheng,et al.  Learning based digital matting , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Hao Lu,et al.  Indices Matter: Learning to Index for Deep Image Matting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Rüdiger Westermann,et al.  RANDOM WALKS FOR INTERACTIVE ALPHA-MATTING , 2005 .

[12]  Quan Chen,et al.  Semantic Human Matting , 2018, ACM Multimedia.

[13]  Jiangyu Liu,et al.  Disentangled Image Matting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Rynson W. H. Lau,et al.  Active Matting , 2018, NeurIPS.

[15]  Manuel Menezes de Oliveira Neto,et al.  Shared Sampling for Real‐Time Alpha Matting , 2010, Comput. Graph. Forum.

[16]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Aljoscha Smolic,et al.  AlphaGAN: Generative adversarial networks for natural image matting , 2018, BMVC.

[18]  Deepu Rajan,et al.  Improving Image Matting Using Comprehensive Sampling Sets , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Hujun Bao,et al.  A Late Fusion CNN for Digital Matting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Michael F. Cohen,et al.  Optimized Color Sampling for Robust Matting , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Dani Lischinski,et al.  A Closed-Form Solution to Natural Image Matting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Ning Xu,et al.  Deep Image Matting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Michael F. Cohen,et al.  An iterative optimization approach for unified image segmentation and matting , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[26]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[28]  Tae-Hyun Oh,et al.  Semantic soft segmentation , 2018, ACM Trans. Graph..

[29]  Ling-Yu Duan,et al.  CRRN: Multi-scale Guided Concurrent Reflection Removal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Jian Sun,et al.  A global sampling method for alpha matting , 2011, CVPR 2011.

[31]  Dani Lischinski,et al.  Spectral Matting , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  In-So Kweon,et al.  Automatic Trimap Generation and Consistent Matting for Light-Field Images , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  David Salesin,et al.  A Bayesian approach to digital matting , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[34]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[35]  Marc Pollefeys,et al.  Designing Effective Inter-Pixel Information Flow for Natural Image Matting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[37]  Wei Chen,et al.  Easy Matting ‐ A Stroke Based Approach for Continuous Image Matting , 2006, Comput. Graph. Forum.

[38]  Ying Wu,et al.  Nonlocal matting , 2011, CVPR 2011.

[39]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[40]  Yuanjie Zheng,et al.  Deep Propagation Based Image Matting , 2018, IJCAI.

[41]  Pushmeet Kohli,et al.  A perceptually motivated online benchmark for image matting , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.