Visual Question Answering From Remote Sensing Images

Remote sensing images carry wide amounts of information beyond land cover or land use. Images contain visual and structural information that can be queried to obtain high level information about specific image content or relational dependencies between the objects sensed. This paper explores the possibility to use questions formulated in natural language as a generic and accessible way to extract this type of information from remote sensing images, i.e. visual question answering. We introduce an automatic way to create a dataset using OpenStreetMap1 data and present some preliminary results. Our proposed approach is based on deep learning, and is trained using our new dataset.

[1]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Qi Wu,et al.  Visual question answering: A survey of methods and datasets , 2016, Comput. Vis. Image Underst..

[6]  Devis Tuia,et al.  Deep Learning Models to Count Buildings in High-Resolution Overhead Images , 2019, 2019 Joint Urban Remote Sensing Event (JURSE).

[7]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[8]  Jiebo Luo,et al.  VizWiz Grand Challenge: Answering Visual Questions from Blind People , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Jonathon S. Hare,et al.  Learning to Count Objects in Natural Images for Visual Question Answering , 2018, ICLR.

[10]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).