Rethink about the Word-level Quality Estimation for Machine Translation from Human Judgement