Multimodal Prediction based on Graph Representations

This paper proposes a learning model, based on rank-fusion graphs, for general applicability in multimodal prediction tasks, such as multimodal regression and image classification. Rank-fusion graphs encode information from multiple descriptors and retrieval models, thus being able to capture underlying relationships between modalities, samples, and the collection itself. The solution is based on the encoding of multiple ranks for a query (or test sample), defined according to different criteria, into a graph. Later, we project the generated graph into an induced vector space, creating fusion vectors, targeting broader generality and efficiency. A fusion vector estimator is then built to infer whether a multimodal input object refers to a class or not. Our method is capable of promoting a fusion model better than early-fusion and late-fusion alternatives. Performed experiments in the context of multiple multimodal and visual datasets, as well as several descriptors and retrieval models, demonstrate that our learning model is highly effective for different prediction scenarios involving visual, textual, and multimodal features, yielding better effectiveness than state-of-the-art methods.

[1]  Yang Yang,et al.  BMC@MediaEval 2017 Multimedia Satellite Task via Regression Random Forest , 2017, MediaEval.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Michael Riegler,et al.  CNN and GAN Based Satellite and Social Media Data Fusion for Disaster Detection , 2017, MediaEval.

[5]  Yiannis S. Boutalis,et al.  FCTH: Fuzzy Color and Texture Histogram - A Low Level Feature for Accurate Image Retrieval , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[6]  Siome Goldenstein,et al.  Graph-based bag-of-words for classification , 2018, Pattern Recognit..

[7]  Minh-Son Dao,et al.  A Domain-based Late-Fusion for Disaster Image Retrieval from Social Media , 2017, MediaEval.

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Joseph N. Wilson,et al.  Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[12]  Ricardo da Silva Torres,et al.  Unsupervised Graph-based Rank Aggregation for Improved Retrieval , 2019, Inf. Process. Manag..

[13]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Lin Li,et al.  Data-Driven Flood Detection using Neural Networks , 2017, MediaEval.

[15]  José M. F. Moura,et al.  VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ricardo da Silva Torres,et al.  Graph-Based Early-Fusion for Flood Detection , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[19]  Horst Bunke,et al.  Matching graphs with unique node labels , 2004, Pattern Analysis and Applications.

[20]  Qi Tian,et al.  Recent Advance in Content-based Image Retrieval: A Literature Survey , 2017, ArXiv.

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Muhammad Hanif,et al.  Flood detection using Social Media Data and Spectral Regression based Kernel Discriminant Analysis , 2017, MediaEval.

[23]  Jurandy Almeida,et al.  Fusion of time series representations for plant recognition in phenology studies , 2016, Pattern Recognit. Lett..

[24]  Nicola Conci,et al.  Convolutional Neural Networks for Disaster Images Retrieval , 2017, MediaEval.

[25]  Ricardo da Silva Torres,et al.  Event Prediction Based on Unsupervised Graph-Based Rank-Fusion Models , 2019, GbRPR.

[26]  Andreas Dengel,et al.  Detection of Flooding Events in Social Multimedia and Satellite Imagery using Deep Neural Networks , 2017, MediaEval.

[27]  Erik Cambria,et al.  Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.

[28]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[29]  Cordelia Schmid,et al.  Coloring Local Feature Extraction , 2006, ECCV.

[30]  Léon Bottou,et al.  Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics , 2014, EMNLP.

[31]  Martha Larson,et al.  Retrieving Social Flooding Images Based on Multimodal Information , 2017, MediaEval.

[32]  Nicu Sebe,et al.  Event-based media processing and analysis: A survey of the literature , 2016, Image Vis. Comput..

[33]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Minh-Son Dao,et al.  A Context-Aware Late-Fusion Approach for Disaster Image Retrieval from Social Media , 2018, ICMR.

[35]  Mario A. Nascimento,et al.  A compact and efficient image retrieval approach based on border/interior pixel classification , 2002, CIKM '02.

[36]  Adam Williams,et al.  Content-based image retrieval using joint correlograms , 2007, Multimedia Tools and Applications.

[37]  Leonidas J. Guibas,et al.  Stable and Informative Spectral Signatures for Graph Matching , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Hermann Ney,et al.  Features for image retrieval: an experimental comparison , 2008, Information Retrieval.

[40]  Ricardo da Silva Torres,et al.  Bag of textual graphs (BoTG): A general graph‐based text representation model , 2019, J. Assoc. Inf. Sci. Technol..

[41]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[42]  Yiannis Kompatsiaris,et al.  Visual and Textual Analysis of Social Media and Satellite Images for Flood Detection @ Multimedia Satellite Task MediaEval 2017 , 2017, MediaEval.

[43]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Ivor W. Tsang,et al.  Late Fusion via Subspace Search With Consistency Preservation , 2019, IEEE Transactions on Image Processing.

[45]  Carina Silberer,et al.  Learning Grounded Meaning Representations with Autoencoders , 2014, ACL.

[46]  Pak Chung Wong,et al.  Graph Signatures for Visual Analytics , 2006, IEEE Transactions on Visualization and Computer Graphics.

[47]  Felix Hill,et al.  Learning Abstract Concept Embeddings from Multi-Modal Data: Since You Probably Can’t See What I Mean , 2014, EMNLP.

[48]  Lei Zou,et al.  gStore: a graph-based SPARQL query engine , 2014, The VLDB Journal.

[49]  Phil Brodatz,et al.  Textures: A Photographic Album for Artists and Designers , 1966 .

[50]  Arkaitz Zubiaga,et al.  WISC at MediaEval 2017: Multimedia Satellite Task , 2017, MediaEval.

[51]  Ricardo da Silva Torres,et al.  Fusion vectors: Embedding Graph Fusions for Efficient Unsupervised Rank Aggregation , 2019, ArXiv.

[52]  Woobin Im,et al.  Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation , 2016, ArXiv.

[53]  Wei Liu,et al.  Multimedia classification and event detection using double fusion , 2013, Multimedia Tools and Applications.

[54]  Ming Yang,et al.  Query Specific Rank Fusion for Image Retrieval , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Salvatore Tabbone,et al.  Graph Matching Based on Node Signatures , 2009, GbRPR.

[56]  Yiannis S. Boutalis,et al.  Automatic Image Annotation and Retrieval Using the Joint Composite Descriptor , 2010, 2010 14th Panhellenic Conference on Informatics.

[57]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.