Multimodal Prediction based on Graph Representations