Connecting Language with Perception and Action for Human-robot Interaction