An experimental study measuring human annotator categorization agreement on commonsense sentences