Exploring Denoised Cross-video Contrast for Weakly-supervised Temporal Action Localization