Perception Updating Networks: On architectural constraints for interpretable video generative models