Improving the Cross-Lingual Generalisation in Visual Question Answering