Align and Attend: Multimodal Summarization with Dual Contrastive Losses