MED-VT: Multiscale Encoder-Decoder Video Transformer with Application to Object Segmentation Supplemental Material