论文信息 - Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics

Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics

Widely used in news, business, and educational media, infographics are handcrafted to effectively communicate messages about complex and often abstract topics including `ways to conserve the environment' and `understanding the financial crisis'. Composed of stylistically and semantically diverse visual and textual elements, infographics pose new challenges for computer vision. While automatic text extraction works well on infographics, computer vision approaches trained on natural images fail to identify the stand-alone visual elements in infographics, or `icons'. To bridge this representation gap, we propose a synthetic data generation strategy: we augment background patches in infographics from our Visually29K dataset with Internet-scraped icons which we use as training data for an icon proposal mechanism. On a test set of 1K annotated infographics, icons are located with 38% precision and 34% recall (the best model trained with natural images achieves 14% precision and 7% recall). Combining our icon proposals with icon classification and text extraction, we present a multi-modal summarization application. Our application takes an infographic as input and automatically produces text tags and visual hashtags that are textually and visually representative of the infographic's topics respectively.

[1] Oren Etzioni,et al. Diagram Understanding in Geometry Questions , 2014, AAAI.

[2] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[3] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Thomas Deselaers,et al. Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] David J. Crandall,et al. A Data Driven Approach for Compound Figure Separation Using Convolutional Neural Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[6] Mingda Zhang,et al. Automatic Understanding of Image and Video Advertisements , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Ankush Gupta,et al. Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Antonio Manuel López Peña,et al. Procedural Generation of Videos to Train Deep Action Recognition Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Martial Hebert,et al. Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10] Michael J. Black,et al. A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[11] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Ramesh Raskar,et al. Object classification through scattering media with deep learning on time resolved measurement. , 2017, Optics express.

[13] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[14] Vladlen Koltun,et al. Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[15] Qiao Wang,et al. VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[17] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[18] Ronan Collobert,et al. Learning to Refine Object Segments , 2016, ECCV.

[19] Jeffrey Heer,et al. ReVision: automated classification, analysis and redesign of chart images , 2011, UIST.

[20] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[21] C. Lawrence Zitnick,et al. Adopting Abstract Images for Semantic Scene Understanding , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Frédo Durand,et al. Learning Visual Importance for Graphic Designs and Data Visualizations , 2017, UIST.

[23] Ali Borji,et al. Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[24] Larry S. Davis,et al. The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Ali Farhadi,et al. A Diagram is Worth a Dozen Images , 2016, ECCV.

[26] Jeffrey Heer,et al. Reverse‐Engineering Visualizations: Recovering Visual Encodings from Chart Images , 2017, Comput. Graph. Forum.

[27] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Hailin Jin,et al. BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29] C. Lawrence Zitnick,et al. Bringing Semantics into Focus Using Visual Abstraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.