Can we trust the evaluation on ChatGPT?