Towards Automatic Diagnosis from Multi-modal Medical Data

Many healthcare applications would significantly benefit from the processing and analyzing of multi-modal data. In this paper, we propose a novel multi-task, multi-modal, and multi-attention framework to learn and align information from multiple medical sources. Based on experiments on a public medical dataset, we show that combining features from images (e.g. x-rays) and texts (e.g. clinical reports), sharing information among different tasks (e.g. x-rays classification, autoencoder, and diagnosis generation) and across domains boost the performance of diagnosis generation (86.0% in terms of BLEU@4).