Embodied Multimodal Agents to Bridge the Understanding Gap