Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders