Rethinking Benchmark and Contamination for Language Models with Rephrased Samples