Words are all you need? Capturing human sensory similarity with textual descriptors