The confounding of translation and rotation in reconstruction from multiple views

If 3D rigid motion is estimated with some error a distorted version of the scene structure will in turn be computed. Of computational interest are these regions in space where the distortions are such that the depths become negative, because in order to be visible the scene has to lie in front of the image. The stability analysis for the structure-from-motion problem presented in this paper investigates the optimal relationship between the errors in the estimated translational and rotational parameters of a rigid motion, that results in the estimation of a minimum number of negative depth values. The input used is the value of the flow along some direction, which is more general than optic flow or correspondence. For a planar retina it is shown that the optimal configuration is achieved when the projections of the translational and rotational errors on the image plane are perpendicular. Furthermore, the projection of the actual and the estimated translation lie on a line passing through the image center. For a spherical retina given a rotational error, the optimal translation is the correct one, while given a translational error. The optimal rotational error is normal to the translational one at an equal distance from the real and estimated translations. The proofs, besides illuminating the confounding of translation and rotation in structure from motion, have an important application to ecological optics, explaining differences of planar and spherical eye or camera designs in motion and shape estimation.