论文信息 - Object Detection for Comics using Manga109 Annotations

Object Detection for Comics using Manga109 Annotations

With the growth of digitized comics, image understanding techniques are becoming important. In this paper, we focus on object detection, which is a fundamental task of image understanding. Although convolutional neural networks (CNN)-based methods archived good performance in object detection for naturalistic images, there are two problems in applying these methods to the comic object detection task. First, there is no large-scale annotated comics dataset. The CNN-based methods require large-scale annotations for training. Secondly, the objects in comics are highly overlapped compared to naturalistic images. This overlap causes the assignment problem in the existing CNN-based methods. To solve these problems, we proposed a new annotation dataset and a new CNN model. We annotated an existing image dataset of comics and created the largest annotation dataset, named Manga109-annotations. For the assignment problem, we proposed a new CNN-based detector, SSD300-fork. We compared SSD300-fork with other detection methods using Manga109-annotations and confirmed that our model outperformed them based on the mAP score.

[1] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[2] Kiyoharu Aizawa,et al. Sketch-based manga retrieval using manga109 dataset , 2015, Multimedia Tools and Applications.

[3] Jean-Christophe Burie,et al. Content-based comic retrieval using multilayer graph representation and frequent graph mining , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[4] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[5] Jean-Christophe Burie,et al. Robust Frame and Text Extraction from Comic Books , 2011, GREC.

[6] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7] Kenta Oono,et al. Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[8] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9] Wei Liu,et al. DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[10] Larry S. Davis,et al. The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[13] Kiyoharu Aizawa,et al. Text detection in manga by combining connected-component-based and region-based classifications , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[14] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[15] Jean-Christophe Burie,et al. Speech balloon and speaker association for comics and manga understanding , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[16] Jean-Christophe Burie,et al. Knowledge-driven understanding of images in comic books , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[17] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18] Kohei Arai,et al. Method for Real Time Text Extraction of Digital Manga Comic , 2011 .

[19] Wei-Chung Cheng,et al. Manga-specific features and latent style model for manga style analysis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[22] Yusuke Niitani,et al. ChainerCV: a Library for Deep Learning in Computer Vision , 2017, ACM Multimedia.

[23] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Alain Bouju,et al. eBDtheque: A Representative Database of Comics , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[25] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Wei-Ta Chu,et al. Line-Based Drawing Style Description for Manga Classification , 2014, ACM Multimedia.