Implications of Multimodal Deep Learning for Textual and Visual Data