Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective