Expanding Perspectives for Comprehending Visual Images in Multimodal Texts