Supplemental : Multiview Consistency as Supervisory Signal for Learning Shape and Pose Prediction

We briefly described, in the main text, the formulation of a view consistency loss L(x̄, C;V ) that measures the inconsistency between a shape x̄ viewed according to camera C and a depth/mask image V . Crucially, this loss was differentiable w.r.t both, pose and shape. As indicated in the main text, our formulation builds upon previously proposed differentiable ray consistency formulation [3] with some innovations to make it differentiable w.r.t pose. For presentation clarity, we first present our full formulation, and later discuss its relation to the previous techniques (a similar discussion can also be found in the main text).