论文信息 - Dynamic Graph Generation Network: Generating Relational Knowledge from Diagrams

Dynamic Graph Generation Network: Generating Relational Knowledge from Diagrams

In this work, we introduce a new algorithm for analyzing a diagram, which contains visual and textual information in an Abstract and integrated way. Whereas diagrams contain richer information compared with individual image-based or language-based data, proper solutions for automatically understanding them have not been proposed due to their innate characteristics of multi-modality and arbitrariness of layouts. To tackle this problem, we propose a unified diagram-parsing network for generating knowledge from diagrams based on an object detector and a recurrent neural network designed for a graphical structure. Specifically, we propose a dynamic graph-generation network that is based on dynamic memory and graph theory. We explore the dynamics of information in a diagram with activation of gates in gated recurrent unit (GRU) cells. On publicly available diagram datasets, our model demonstrates a state-of-the-art result that outperforms other baselines. Moreover, further experiments on question answering shows potentials of the proposed method for various applications.

[1] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Danfei Xu,et al. Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[5] Xiaogang Wang,et al. ViP-CNN: Visual Phrase Guided Convolutional Neural Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[7] Richard S. Zemel,et al. Gated Graph Sequence Neural Networks , 2015, ICLR.

[8] Abhinav Gupta,et al. The More You Know: Using Knowledge Graphs for Image Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Jonathan T. Barron,et al. Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Iasonas Kokkinos,et al. Highly accurate boundary detection and grouping , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[12] Praveen Paritosh,et al. Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[13] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.

[14] Jayant Krishnamurthy,et al. Semantic Parsing to Probabilistic Programs for Situated Question Answering , 2016, EMNLP.

[15] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[16] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[18] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[19] Jonghyun Choi,et al. Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Xiaogang Wang,et al. ViP-CNN: A Visual Phrase Reasoning Convolutional Neural Network for Visual Relationship Detection , 2017, ArXiv.

[21] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[23] Michael S. Bernstein,et al. Visual Relationship Detection with Language Priors , 2016, ECCV.

[24] Jason Weston,et al. Memory Networks , 2014, ICLR.

[25] Ali Farhadi,et al. A Diagram is Worth a Dozen Images , 2016, ECCV.

[26] Jiasen Lu,et al. VQA: Visual Question Answering , 2015, ICCV.

[27] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.