MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training