In various clinical scenarios, medical image is crucial in disease diagnosis and treatment. Different modalities of medical images provide complementary information and jointly helps doctors to make accurate clinical decision. However, due to clinical and practical restrictions, certain imaging modalities may be unavailable nor complete. To impute missing data with adequate clinical accuracy, here we propose a framework called self-supervised collaborative learning to synthesize missing modality for medical images. The proposed method comprehensively utilize all available information correlated to the target modality from multi-source-modality images to generate any missing modality in a single model. Different from the existing methods, we introduce an auto-encoder network as a novel, self-supervised constraint, which provides target-modality-specific information to guide generator training. In addition, we design a modality mask vector as the target modality label. With experiments on multiple medical image databases, we demonstrate a great generalization ability as well as specialty of our method compared with other state-of-the-arts.