Learning Disentangled Representations of Video with Missing Data