Masked Motion Encoding for Self-Supervised Video Representation Learning