RSVQA Meets Bigearthnet: A New, Large-Scale, Visual Question Answering Dataset for Remote Sensing

Visual Question Answering is a new task that can facilitate the extraction of information from images through textual queries: it aims at answering an open-ended question formulated in natural language about a given image. In this work, we introduce a new dataset to tackle the task of visual question answering on remote sensing images: this large-scale, open access dataset extracts image/question/answer triplets from the BigEarthNet dataset. This new dataset contains close to 15 millions samples and is openly available. We present the dataset construction procedure, its characteristics and first results using a deep-learning based methodology. These first results show that the task of visual question answering is challenging and opens new interesting research avenues at the interface of remote sensing and natural language processing. The dataset and the code to create and process it are open and freely available on https://rsvqa.sylvainlobry.com/