A Multimodal Anthropomorphic Agent which Learns Visual Information Through Interactions