Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation