Logically Consistent Loss for Visual Question Answering

Given an image, a back-ground knowledge, and a set of questions about an object, human learners answer the questions very consistently regardless of question forms and semantic tasks. The current advancement in neural-network based Visual Question Answering (VQA), despite their impressive performance, cannot ensure such consistency due to identically distribution (i.i.d.) assumption. We propose a new model-agnostic logic constraint to tackle this issue by formulating a logically consistent loss in the multi-task learning framework as well as a data organisation called family-batch and hybrid-batch. To demonstrate usefulness of this proposal, we train and evaluate MAC-net based VQA machines with and without the proposed logically consistent loss and the proposed data organization. The experiments confirm that the proposed loss formulae and introduction of hybrid-batch leads to more consistency as well as better performance. Though the proposed approach is tested with MAC-net, it can be utilised in any other QA methods whenever the logical consistency between answers exist.

[1]  Chitta Baral,et al.  VQA-LOL: Visual Question Answering under the Lens of Logic , 2020, ECCV.

[2]  Marco Gori,et al.  Learning and T-Norms Theory , 2019, ArXiv.

[3]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[4]  Anton van den Hengel,et al.  Visual Question Answering with Prior Class Semantics , 2020, ArXiv.

[5]  Christian Eitzinger,et al.  Triangular Norms , 2001, Künstliche Intell..

[6]  Christopher D. Manning,et al.  GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Marco Maggini,et al.  T-norms driven loss functions for machine learning , 2019, Applied Intelligence.

[8]  Mohit Bansal,et al.  LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.

[9]  Chitta Baral,et al.  MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering , 2020, EMNLP.

[10]  Xinlei Chen,et al.  Cycle-Consistency for Robust Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Thanh-Toan Do,et al.  Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering , 2020, ECCV Workshops.

[12]  Luc De Raedt,et al.  ProbLog: A Probabilistic Prolog and its Application in Link Discovery , 2007, IJCAI.

[13]  Truyen Tran,et al.  Dynamic Language Binding in Relational Visual Reasoning , 2020, IJCAI.

[14]  Pedro M. Domingos,et al.  Learning Markov logic network structure via hypergraph lifting , 2009, ICML '09.

[15]  Mitesh M. Khapra,et al.  Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph , 2018, AAAI.

[16]  Ajay Divakaran,et al.  Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation , 2019, EMNLP.

[17]  Marco Gori,et al.  Integrating Learning and Reasoning with Deep Logic Models , 2019, ECML/PKDD.

[18]  Christopher D. Manning,et al.  Compositional Attention Networks for Machine Reasoning , 2018, ICLR.

[19]  Eric Horvitz,et al.  SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[21]  Christopher D. Manning,et al.  Learning by Abstraction: The Neural State Machine , 2019, NeurIPS.

[22]  Petr Hájek,et al.  Metamathematics of Fuzzy Logic , 1998, Trends in Logic.

[23]  Radko Mesiar,et al.  Triangular norms. Position paper II: general constructions and parameterized families , 2004, Fuzzy Sets Syst..

[24]  Oleksandr Polozov,et al.  Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" , 2020, ICML.