论文信息 - Layer-wise enhanced transformer with multi-modal fusion for image caption - 字舞流文

Layer-wise enhanced transformer with multi-modal fusion for image caption

Yi Wang | Dexin Zhao | Jingdan Li