Federated Self-supervised Learning for Video Understanding