Computational Approaches to Comics Analysis

Comics are complex documents whose reception engages cognitive processes such as scene perception, language processing, and narrative understanding. Possibly because of their complexity, they have rarely been studied in cognitive science. Modeling the stimulus ideally requires a formal description, which can be provided by feature descriptors from computer vision and computational linguistics. With a focus on document analysis, here we review work on the computational modeling of comics. We argue that the development of modern feature descriptors based on deep learning techniques has made sufficient progress to allow the investigation of complex material such as comics for reception studies, including experimentation and computational modeling of cognitive processes.

[1]  Miki Ueno Structure Analysis on Common Plot in Four-Scene Comic Story Dataset , 2019, MMM.

[2]  Lester C. Loschky,et al.  Understanding Moment‐to‐Moment Processing of Visual Narratives , 2018, Cogn. Sci..

[3]  Joost van de Weijer,et al.  An Active Contour Model for Speech Balloon Detection in Comics , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[4]  Yair Weiss,et al.  Learning object detection from a small number of examples: the importance of good features , 2004, CVPR 2004.

[5]  Wei-Ta Chu,et al.  Manga FaceNet: Face Detection in Manga based on Deep Neural Network , 2017, ICMR.

[6]  Yusuke Matsui,et al.  Illustration2Vec: a semantic vector representation of illustrations , 2015, SIGGRAPH Asia Technical Briefs.

[7]  Kiyoharu Aizawa,et al.  Sketch-based manga retrieval using manga109 dataset , 2015, Multimedia Tools and Applications.

[8]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[9]  Ray Smith An Overview of the Tesseract OCR Engine , 2007 .

[10]  Zhi Tang,et al.  A Faster R-CNN Based Method for Comic Characters Face Detection , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[11]  Neil Cohn,et al.  Navigating Comics: An Empirical and Theoretical Approach to Strategies of Reading Comic Page Layouts , 2013, Front. Psychol..

[12]  A. Reber Implicit learning of artificial grammars , 1967 .

[13]  Rynson W. H. Lau,et al.  Automatic stylistic manga layout , 2012, ACM Trans. Graph..

[14]  Leon A. Gatys,et al.  Understanding Low- and High-Level Contributions to Fixation Prediction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Neil Cohn Visual Language Panels Time Temporal Map the Limits of Time and Transitions: Challenges to Theories of Sequential Image Comprehension , 2022 .

[17]  Ralf Engbert,et al.  Control of fixation duration during scene viewing by interaction of foveal and peripheral processing. , 2013, Journal of vision.

[18]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[19]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[20]  Hailin Jin,et al.  BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[22]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[23]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Rita Hartel,et al.  How Good Is Good Enough? Establishing Quality Thresholds for the Automatic Text Analysis of Retro-Digitized Comics , 2018, MMM.

[25]  Rynson W. H. Lau,et al.  A Robust Panel Extraction Method for Manga , 2014, ACM Multimedia.

[26]  Motoi Iwata,et al.  An Overview of Comics Research in Computer Science , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[27]  Jean-Christophe Burie,et al.  Specific Comic Character Detection Using Local Feature Matching , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[28]  Jean-Christophe Burie,et al.  Multi-task Model for Comic Book Image Analysis , 2019, MMM.

[29]  Jean-Christophe Burie,et al.  Text-Independent Speech Balloon Segmentation for Comics and Manga , 2015, GREC.

[30]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[31]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Charles Forceville,et al.  Visual representations of the idealized cognitive model of anger in the Asterix album 'La Zizanie' , 2005 .

[33]  Yaser Sheikh,et al.  Inferring artistic intention in comic art through viewer gaze , 2012, SAP.

[34]  Reinhold Kliegl,et al.  SWIFT: a dynamical model of saccade generation during reading. , 2005, Psychological review.

[35]  Seiichi Uchida,et al.  Comic Text Detection Using Neural Network Approach , 2019, MMM.

[36]  Jochen Laubrock,et al.  CNN-Based Classification of Illustrator Style in Graphic Novels: Which Features Contribute Most? , 2018, MMM.

[37]  Jean-Christophe Burie,et al.  Digital Comics Image Indexing Based on Deep Learning , 2018, J. Imaging.

[38]  Ralf Engbert,et al.  Disentangling bottom-up versus top-down and low-level versus high-level influences on eye movements over time. , 2019, Journal of vision.

[39]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[40]  Jean-Christophe Burie,et al.  Speech balloon and speaker association for comics and manga understanding , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[41]  Satoshi Nakamura,et al.  Fontender: Interactive Japanese Text Design with Dynamic Font Fusion Method for Comics , 2019, MMM.

[42]  Jean-Christophe Burie,et al.  Segmentation-Free Speech Text Recognition for Comic Books , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[43]  Martin Stommel,et al.  Segmentation-Free Detection of Comic Panels , 2012, ICCVG.

[44]  B. Hassenstein,et al.  Systemtheoretische Analyse der Zeit-, Reihenfolgen- und Vorzeichenauswertung bei der Bewegungsperzeption des Rüsselkäfers Chlorophanus , 1956 .

[45]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[46]  Rynson W. H. Lau,et al.  Look over here , 2014, ACM Trans. Graph..

[47]  J. Duncan The locus of interference in the perception of simultaneous stimuli. , 1980 .

[48]  Jean-Christophe Burie,et al.  Knowledge-driven understanding of images in comic books , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[49]  Phillip Vaughan,et al.  Reading words and images: Factors influencing eye movements in comic reading , 2018 .

[50]  Simon Barthelmé,et al.  Spatial statistics and attentional dynamics in scene viewing. , 2014, Journal of vision.

[51]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[52]  D. Rosenthal,et al.  Quality-space theory in olfaction , 2014, Front. Psychol..

[53]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[54]  Alain Bouju,et al.  eBDtheque: A Representative Database of Comics , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[55]  Matthias Kümmerer,et al.  Attention to Comics , 2018, Empirical Comics Research.

[56]  Azriel Rosenfeld,et al.  Sequential Operations in Digital Picture Processing , 1966, JACM.

[57]  P. Schmidt,et al.  Economic Deprivation and Its Effects on Childhood Conduct Problems: The Mediating Role of Family Stress and Investment Factors , 2017, Front. Psychol..

[58]  Lev Manovich,et al.  How to Compare One Million Images , 2012 .

[59]  J. Henderson,et al.  High-level scene perception. , 1999, Annual review of psychology.

[60]  Janina Wildfeuer,et al.  An open multilevel classification scheme for the visual layout of comics and graphic novels: Motivation and design , 2016, Digit. Scholarsh. Humanit..

[61]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Kiyoharu Aizawa,et al.  Proceedings of the 1st International Workshop on coMics ANalysis, Processing and Understanding , 2016, MANPU@ICPR.

[63]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[64]  Neil Cohn,et al.  The changing pages of comics : Page layouts across eight decades of American superhero comics , 2016 .

[65]  Phillip Vaughan,et al.  Reading Words and Images , 2018, Empirical Comics Research.

[66]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[67]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[69]  Jochen Laubrock,et al.  Deep CNN-Based Speech Balloon Detection and Segmentation for Comic Books , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[70]  Jeffrey M. Zacks,et al.  Event structure in perception and conception. , 2001, Psychological bulletin.

[71]  T. Foulsham,et al.  Reading Without Words: Eye Movements in the Comprehension of Comic Strips , 2016 .

[72]  Klaus Oberauer,et al.  A formal model of capacity limits in working memory , 2006 .

[73]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[74]  Rolf A. Zwaan Situation models, mental simulations, and abstract concepts in discourse comprehension , 2015, Psychonomic bulletin & review.

[75]  Michelle R. Greene,et al.  Recognition of natural scenes from global properties: Seeing the forest without representing the trees , 2009, Cognitive Psychology.

[76]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[77]  Jean-Christophe Burie,et al.  Robust Frame and Text Extraction from Comic Books , 2011, GREC.

[78]  Mark A. Finlayson Inferring Propp’s Functions from Semantically Annotated Text , 2016 .

[79]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[80]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[81]  Neil Cohn A multimodal parallel architecture: A cognitive framework for multimodal interactions , 2016, Cognition.

[82]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[83]  Larry S. Davis,et al.  The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Michelle R. Greene,et al.  PSYCHOLOGICAL SCIENCE Research Article The Briefest of Glances The Time Course of Natural Scene Understanding , 2022 .

[85]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[86]  Jean-Christophe Burie,et al.  Panel and Speech Balloon Extraction from Comic Books , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[87]  James J DiCarlo,et al.  Neural population control via deep image synthesis , 2018, Science.

[88]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Neil Cohn,et al.  A Picture is Worth More Words Over Time: Multimodality and Narrative Structure Across Eight Decades of American Superhero Comics , 2017 .

[90]  Rita Hartel,et al.  The Graphic Narrative Corpus (GNC): Design, Annotation, and Analysis for the Digital Humanities , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[91]  Umapada Pal,et al.  Shallow Neural Network Model for Hand-Drawn Symbol Recognition in Multi-Writer Scenario , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[92]  James R. Brockmole,et al.  LATEST: A Model of Saccadic Decisions in Space and Time , 2017, Psychological review.

[93]  Wei-Chung Cheng,et al.  Manga-specific features and latent style model for manga style analysis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[94]  Neil Cohn,et al.  Visual Narrative Structure , 2013, Cogn. Sci..

[95]  Neil Cohn,et al.  The grammar of visual narrative: Neural evidence for constituent structure in sequential image comprehension , 2014, Neuropsychologia.

[96]  J. Bateman,et al.  A multimodal discourse theory of visual narrative , 2014 .

[97]  Thomas M. Breuel,et al.  The OCRopus open source OCR system , 2008, Electronic Imaging.