Adversarial Graph Convolutional Network for Cross-Modal Retrieval

The completeness of semantic expression plays an important role in cross-modal retrieval tasks, which contributes to align the cross-modal data and thus narrow the modality gap. But due to the abstractness of semantics, the same topic may have different aspects to be well described so it may be incomplete to express semantics with only one sample. In order to obtain semantic complementary information and strengthen similar information for samples with the same semantics, we utilize a graph convolutional network (GCN) to reconstruct the sample representation based on the adjacency relationship between the sample itself and its neighborhoods. We construct a local graph for each instance, and propose a novel Graph Feature Generator based on GCN and a fully-connected network to reconstruct node features based on local graph and map the features of two modalities into a common space. The Graph Feature Generator and Graph Feature Discriminator adopt a minimax game strategy to generate modality-invariant graph feature representations. Experiments on three benchmark datasets demonstrate the superiority of our proposed model compared with several state-of-the-art methods.