Self-Adaptive Neural Module Transformer for Visual Question Answering

[1]  Christopher Kanan,et al.  Visual question answering: Datasets, algorithms, and future challenges , 2016, Comput. Vis. Image Underst..

[2]  Trevor Darrell,et al.  Explainable Neural Computation via Stack Neural Module Networks , 2018, ECCV.

[3]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[5]  Qi Wu,et al.  Visual question answering: A survey of methods and datasets , 2016, Comput. Vis. Image Underst..

[6]  Yueting Zhuang,et al.  Frame Augmented Alternating Attention Network for Video Question Answering , 2020, IEEE Transactions on Multimedia.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yue Gao,et al.  Beyond Text QA: Multimedia Answer Generation by Harvesting Web Information , 2013, IEEE Transactions on Multimedia.

[9]  Quoc V. Le,et al.  Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.

[10]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[11]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Gang Hua,et al.  Multimedia Big Data Computing , 2015, IEEE Multim..

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yash Goyal,et al.  Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Niklas Carlsson,et al.  Optimized Adaptive Streaming of Multi-video Stream Bundles , 2017, IEEE Transactions on Multimedia.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Yash Goyal,et al.  Yin and Yang: Balancing and Answering Binary Visual Questions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[20]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Yuchun Ma,et al.  AddressNet: Shift-Based Primitives for Efficient Convolutional Neural Networks , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[22]  Li Fei-Fei,et al.  Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Zhou Yu,et al.  Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Lei Yu,et al.  Question Quality Analysis and Prediction in Community Question Answering Services with Coupled Mutual Reinforcement , 2017, IEEE Transactions on Services Computing.

[25]  Trevor Darrell,et al.  Learning to Reason: End-to-End Module Networks for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Christopher D. Manning,et al.  Compositional Attention Networks for Machine Reasoning , 2018, ICLR.

[27]  David Mascharka,et al.  Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Chenliang Xu,et al.  Dancelets Mining for Video Recommendation Based on Dance Styles , 2017, IEEE Transactions on Multimedia.

[29]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Hanqing Lu,et al.  Enhancing Visual Question Answering Using Dropout , 2018, ACM Multimedia.

[31]  Allan Jabri,et al.  Revisiting Visual Question Answering Baselines , 2016, ECCV.

[32]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[33]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[34]  Peng Gao,et al.  Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Meng Wang,et al.  Stochastic Multiview Hashing for Large-Scale Near-Duplicate Video Retrieval , 2017, IEEE Transactions on Multimedia.