Are 3D convolutional networks inherently biased towards appearance?