Testing AI performance on less frequent aspects of language reveals insensitivity to underlying meaning