CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation