Object Detection for Comics using Manga109 Annotations

With the growth of digitized comics, image understanding techniques are becoming important. In this paper, we focus on object detection, which is a fundamental task of image understanding. Although convolutional neural networks (CNN)-based methods archived good performance in object detection for naturalistic images, there are two problems in applying these methods to the comic object detection task. First, there is no large-scale annotated comics dataset. The CNN-based methods require large-scale annotations for training. Secondly, the objects in comics are highly overlapped compared to naturalistic images. This overlap causes the assignment problem in the existing CNN-based methods. To solve these problems, we proposed a new annotation dataset and a new CNN model. We annotated an existing image dataset of comics and created the largest annotation dataset, named Manga109-annotations. For the assignment problem, we proposed a new CNN-based detector, SSD300-fork. We compared SSD300-fork with other detection methods using Manga109-annotations and confirmed that our model outperformed them based on the mAP score.

[1]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[2]  Kiyoharu Aizawa,et al.  Sketch-based manga retrieval using manga109 dataset , 2015, Multimedia Tools and Applications.

[3]  Jean-Christophe Burie,et al.  Content-based comic retrieval using multilayer graph representation and frequent graph mining , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[4]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[5]  Jean-Christophe Burie,et al.  Robust Frame and Text Extraction from Comic Books , 2011, GREC.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[8]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[10]  Larry S. Davis,et al.  The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[13]  Kiyoharu Aizawa,et al.  Text detection in manga by combining connected-component-based and region-based classifications , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[14]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[15]  Jean-Christophe Burie,et al.  Speech balloon and speaker association for comics and manga understanding , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[16]  Jean-Christophe Burie,et al.  Knowledge-driven understanding of images in comic books , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[17]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Kohei Arai,et al.  Method for Real Time Text Extraction of Digital Manga Comic , 2011 .

[19]  Wei-Chung Cheng,et al.  Manga-specific features and latent style model for manga style analysis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[22]  Yusuke Niitani,et al.  ChainerCV: a Library for Deep Learning in Computer Vision , 2017, ACM Multimedia.

[23]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Alain Bouju,et al.  eBDtheque: A Representative Database of Comics , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[25]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wei-Ta Chu,et al.  Line-Based Drawing Style Description for Manga Classification , 2014, ACM Multimedia.