Multiple-Input Multiple-Output Fusion Network for Generalized Zero-Shot Learning

Generalized zero-shot learning (GZSL) has attracted considerable attention recently, which trains models with data from seen classes and tests on data from both seen and unseen classes. Most of the existing methods attempt to find a mapping from visual space to semantic space, such mapping can easily result in the domain shift problem. To address this issue, we propose a Multiple-Input Multiple-Output Fusion Network to GZSL. It can generate similar common semantic representation to paired inputs even with only the class semantic embeddings. This makes it possible to synthesize pseudo samples from attributes of unseen classes. Extensive experiments carried out on three benchmark datasets show the effectiveness of the proposed model.