论文信息 - Digital Comics Image Indexing Based on Deep Learning

Digital Comics Image Indexing Based on Deep Learning

The digital comic book market is growing every year now, mixing digitized and digital-born comics. Digitized comics suffer from a limited automatic content understanding which restricts online content search and reading applications. This study shows how to combine state-of-the-art image analysis methods to encode and index images into an XML-like text file. Content description file can then be used to automatically split comic book images into sub-images corresponding to panels easily indexable with relevant information about their respective content. This allows advanced search in keywords said by specific comic characters, action and scene retrieval using natural language processing. We get down to panel, balloon, text, comic character and face detection using traditional approaches and breakthrough deep learning models, and also text recognition using LSTM model. Evaluations on a dataset composed of online library content are presented, and a new public dataset is also proposed.

[1] Honglak Lee,et al. Unsupervised learning of hierarchical representations with convolutional deep belief networks , 2011, Commun. ACM.

[2] Haruo Hibino,et al. IMPACTS OF MANGA ON INDONESIAN READERS' SELF-EFFICACY AND BEHAVIOR INTENTIONS TO IMITATE ITS VISUALS , 2012 .

[3] Jean-Christophe Burie,et al. Robust Frame and Text Extraction from Comic Books , 2011, GREC.

[4] Mike Monaco. Grand Comics Database , 2016 .

[5] Thomas M. Breuel,et al. Can we build language-independent OCR using LSTM networks? , 2013, MOCR '13.

[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Julian Fierrez,et al. Identification using face regions: application and assessment in forensic scenarios. , 2013, Forensic science international.

[8] Fahad Shahbaz Khan,et al. Color attributes for object detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Richard O. Duda,et al. Use of the Hough transformation to detect lines and curves in pictures , 1972, CACM.

[10] Eunjung Han,et al. Frame Segmentation Used MLP-Based X-Y Recursive for Mobile Cartoon Content , 2007, HCI.

[11] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.

[12] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[13] Tat-Seng Chua,et al. A bootstrapping framework for annotating and retrieving WWW images , 2004, MULTIMEDIA '04.

[14] Kiyoharu Aizawa,et al. Sketch-based manga retrieval using manga109 dataset , 2015, Multimedia Tools and Applications.

[15] Jean-Christophe Burie,et al. Panel and Speech Balloon Extraction from Comic Books , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[16] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[17] Tomoyuki Nishita,et al. FACE DETECTION AND FACE RECOGNITION OF CARTOON CHARACTERS USING FEATURE EXTRACTION , 2012 .

[18] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[20] Koichi Kise,et al. Detection of exact and similar partial copies for copyright protection of manga , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[21] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[23] Zhi Tang,et al. A clump splitting based method to localize speech balloons in comics , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[24] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[25] Shigeo Sugimoto,et al. Metadata Framework for Manga: A Multi-paradigm Metadata Description Framework for Digital Comics , 2009, Dublin Core Conference.

[26] Christophe Ponsard,et al. An OCR-Enabled Digital Comic Books Viewer , 2012, ICCHP.

[27] Alain Bouju,et al. eBDtheque: A Representative Database of Comics , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[28] II. O Verview. Redundant structure detection in attributed adjacency graphs for character detection in comics books , 2018 .

[29] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[30] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Rynson W. H. Lau,et al. A Robust Panel Extraction Method for Manga , 2014, ACM Multimedia.

[32] Motoi Iwata,et al. A Study to Achieve Manga Character Retrieval Method for Manga Images , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[33] Jean-Christophe Burie,et al. Specific Comic Character Detection Using Local Feature Matching , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[34] B. Duc. Du scénario à la réalisation graphique, tout sur la création des bandes dessinées , 1982 .

[35] Motoi Iwata,et al. A survey of comics research in computer science , 2018, J. Imaging.

[36] Rahmat Budiarto,et al. Comic Image Decomposition for Reading Comics on Cellular Phones , 2004, IEICE Trans. Inf. Syst..

[37] Geoffrey Zweig,et al. Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[38] Koichi Kise,et al. Similar Partial Copy Detection of Line Drawings Using a Cascade Classifier and Feature Matching , 2010, ICWF.

[39] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40] Dong Liu,et al. Comic storyboard extraction via edge segment analysis , 2016, Multimedia Tools and Applications.

[41] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[42] Jean-Christophe Burie,et al. Text-Independent Speech Balloon Segmentation for Comics and Manga , 2015, GREC.

[43] Stuart Medley,et al. Discerning pictures: how we look at and understand images in comics , 2010 .

[44] Kenji Shoji,et al. Layout Analysis of Tree-Structured Scene Frames in Comic Images , 2007, IJCAI.

[45] Zhi Tang,et al. Automatic comic page segmentation based on polygon detection , 2012, Multimedia Tools and Applications.

[46] Kohei Arai,et al. Method for Real Time Text Extraction of Digital Manga Comic , 2011 .

[47] Mark J. F. Gales,et al. Speech Recognition System Combination for Machine Translation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[48] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49] Wei-Chung Cheng,et al. Manga-specific features and latent style model for manga style analysis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50] Wei-Ying Ma,et al. Hierarchical clustering of WWW image search results using visual, textual and link information , 2004, MULTIMEDIA '04.

[51] Jean-Christophe Burie,et al. Segmentation-Free Speech Text Recognition for Comic Books , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[52] Jean-Christophe Burie,et al. Speech balloon and speaker association for comics and manga understanding , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[53] Simon J. Doran,et al. Stacked Autoencoders for Unsupervised Feature Learning and Multiple Organ Detection in a Pilot Study Using 4D Patient Data , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54] Kohei Arai,et al. Method for Automatic E-Comic Scene Frame Extraction for Reading Comic on Mobile Devices , 2010, 2010 Seventh International Conference on Information Technology: New Generations.

[55] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[56] Jean-Christophe Burie,et al. A Comic Retrieval System Based on Multilayer Graph Representation and Graph Mining , 2015, GbRPR.

[57] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59] Sébastien Eskenazi,et al. A comprehensive survey of mostly textual document segmentation algorithms since 2008 , 2017, Pattern Recognit..

[60] Larry S. Davis,et al. The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Zhi Tang,et al. Unsupervised Speech Text Localization in Comic Images , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[62] Gregory C. Colati,et al. Better, Faster, Stronger , 2009 .

[63] Ray-I Chang,et al. Recognizing Text Elements for SVG Comic Compression and Its Novel Applications , 2011, 2011 International Conference on Document Analysis and Recognition.

[64] Klaus U. Schulz,et al. Automatic quality evaluation and (semi-) automatic improvement of mixed models for OCR on historical documents , 2016, ArXiv.

[65] Martin Stommel,et al. Segmentation-Free Detection of Comic Panels , 2012, ICCVG.

[66] John A. Walsh,et al. Comic Book Markup Language: An Introduction and Rationale , 2012, Digit. Humanit. Q..

[67] Thomas M. Breuel,et al. High-Performance OCR for Printed English and Fraktur Using LSTM Networks , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[68] Marcus Liwicki,et al. Recognition of historical Greek polytonic scripts using LSTM networks , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[69] Wei-Ta Chu,et al. Manga FaceNet: Face Detection in Manga based on Deep Neural Network , 2017, ICMR.

[70] Neil Cohn. Visual Language Panels Time Temporal Map the Limits of Time and Transitions: Challenges to Theories of Sequential Image Comprehension , 2022 .

[71] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72] Zhi Tang,et al. Comic frame extraction via line segments combination , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[73] Christophe Rigaud,et al. Segmentation and indexation of complex objects in comic book images. (Segmentation et indexation d'objets complexes dans les images de bandes dessinées) , 2014 .

[74] Jean-Christophe Burie,et al. Knowledge-driven understanding of images in comic books , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[75] Xueting Liu,et al. Text-aware balloon extraction from manga , 2015, The Visual Computer.

[76] Dong Liu,et al. A tree conditional random field model for panel detection in comic images , 2015, Pattern Recognit..

[77] R. Smith,et al. An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[78] Alexander Halavais,et al. Search Engine Society , 2008 .

[79] Masakazu Higuchi,et al. Fast frame decomposition and sorting by contour tracing for mobile phone comic images , 2010 .

[80] Joost van de Weijer,et al. Automatic Text Localisation in Scanned Comic Books , 2013, VISAPP.

[81] Toshihiro Kuboi,et al. Element Detection in Japanese Comic Book Panels , 2014 .

[82] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83] Andreas Dengel,et al. anyOCR: A sequence learning based OCR system for unlabeled historical documents , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[84] Elena Fell. Search Engine Society (Digital Media and Society) , 2019 .

[85] Dong Yu,et al. Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[86] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[87] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[88] Boris Motik,et al. OWL 2 Web Ontology Language: Direct Semantics , 2009 .