Web Documents Similarity Using K-Shingle Tokens and MinHash Technique