Self-labeling video prediction