Self-supervised Video Object Segmentation with Distillation Learning of Deformable Attention