Estimating Image Depth in the Comics Domain

Estimating the depth of comics images is challenging as such images a) are monocular; b) lack ground-truth depth annotations; c) differ across different artistic styles; d) are sparse and noisy. We thus, use an off-the-shelf unsupervised image to image translation method to translate the comics images to natural ones and then use an attention-guided monocular depth estimator to predict their depth. This lets us leverage the depth annotations of existing natural images to train the depth estimator. Furthermore, our model learns to distinguish between text and images in the comics panels to reduce text-based artefacts in the depth estimates. Our method consistently outperforms the existing state-ofthe-art approaches across all metrics on both the DCM and eBDtheque images. Finally, we introduce a dataset to evaluate depth prediction on comics.

[1]  Edgar A. Bernal,et al.  Generative Adversarial Networks for Depth Map Estimation from RGB Video , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Dongbing Gu,et al.  Unsupervised framework for depth estimation and camera motion prediction from video , 2020, Neurocomputing.

[3]  Richard Szeliski,et al.  Consistent video depth estimation , 2020, ACM Trans. Graph..

[4]  Gustavo Carneiro,et al.  Self-Supervised Monocular Trained Depth Estimation Using Self-Attention and Discrete Disparity Volume , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Radim Sára,et al.  Spatial Pattern Templates for Recognition of Objects with Regular Structure , 2013, GCPR.

[6]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Dacheng Tao,et al.  Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Wei Xu,et al.  Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency , 2017, ArXiv.

[9]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[10]  Seokjae Lim,et al.  Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Seungryong Kim,et al.  DUNIT: Detection-Based Unsupervised Image-to-Image Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Simon Lucey,et al.  Web Stereo Video Supervision for Depth Prediction from Dynamic Scenes , 2019, 2019 International Conference on 3D Vision (3DV).

[13]  Qingming Huang,et al.  Stereoscopic Image Retargeting Based on Deep Convolutional Neural Network , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Yinda Zhang,et al.  HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching , 2020, ArXiv.

[15]  Gozde Unal,et al.  Single Image Depth Estimation: An Overview , 2021, Digit. Signal Process..

[16]  R. Venkatesh Babu,et al.  AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Wei Jia,et al.  Adversarial-learning-based image-to-image transformation: A survey , 2020, Neurocomputing.

[18]  Johannes Kopf,et al.  Robust Consistent Video Depth Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xiaogang Wang,et al.  Learning Monocular Depth by Distilling Cross-domain Stereo Networks , 2018, ECCV.

[20]  Zhiqiang Shen,et al.  Towards Instance-Level Image-To-Image Translation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Alain Bouju,et al.  eBDtheque: A Representative Database of Comics , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[22]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[23]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[25]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[26]  Luigi di Stefano,et al.  Real-Time Self-Adaptive Deep Stereo , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Margrit Gelautz,et al.  3D Scene Reconstruction by Stereo Methods for Analysis and Visualization of Sports Scenes , 2008, Computer Science in Sport - Mission and Methods.

[28]  Jan Kautz,et al.  Bi3D: Stereo Depth Estimation via Binary Classifications , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Luigi di Stefano,et al.  Unsupervised Domain Adaptation for Depth Prediction from Images , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[31]  Toby P. Breckon,et al.  Real-Time Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Jean-Christophe Burie,et al.  Digital Comics Image Indexing Based on Deep Learning , 2018, J. Imaging.

[33]  In-So Kweon,et al.  High-Quality Depth from Uncalibrated Small Motion Clip , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Junmo Kim,et al.  Leveraging Contextual Information for Monocular Depth Estimation , 2020, IEEE Access.

[35]  Konrad Schindler,et al.  Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[37]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jianfei Cai,et al.  T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks , 2018, ECCV.

[39]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[40]  Jörg Stückler,et al.  Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Gillian Macnaught,et al.  Unsupervised learning for cross-domain medical image synthesis using deformation invariant cycle consistency networks , 2018, SASHIMI@MICCAI.

[42]  Hossein Arefi,et al.  Monocular depth estimation with geometrical guidance using a multi-level convolutional neural network , 2019, Appl. Soft Comput..

[43]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[44]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Suchendra M. Bhandarkar,et al.  Monocular Depth Prediction Using Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[46]  Christophe Rigaud,et al.  Segmentation and indexation of complex objects in comic book images. (Segmentation et indexation d'objets complexes dans les images de bandes dessinées) , 2014 .