Spherical Regression: Learning Viewpoints, Surface Normals and 3D Rotations on N-Spheres

Many computer vision challenges require continuous outputs, but tend to be solved by discrete classification. The reason is classification's natural containment within a probability n-simplex, as defined by the popular softmax activation function. Regular regression lacks such a closed geometry, leading to unstable training and convergence to suboptimal local minima. Starting from this insight we revisit regression in convolutional neural networks. We observe many continuous output problems in computer vision are naturally contained in closed geometrical manifolds, like the Euler angles in viewpoint estimation or the normals in surface normal estimation. A natural framework for posing such continuous output problems are n-spheres, which are naturally closed geometric manifolds defined in the R^(n+1) space. By introducing a spherical exponential mapping on n-spheres at the regression output, we obtain well-behaved gradients, leading to stable training. We show how our spherical regression can be utilized for several computer vision challenges, specifically viewpoint estimation, surface normal estimation and 3D rotation estimation. For all these problems our experiments demonstrate the benefit of spherical regression. All paper resources are available at https://github.com/leoshine/Spherical_Regression.

[1]  Marc Pollefeys,et al.  Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  H. Flanders Differential Forms with Applications to the Physical Sciences , 1964 .

[3]  Jin Gao,et al.  Transfer Learning Based Visual Tracking with Gaussian Processes Regression , 2014, ECCV.

[4]  Seunghoon Hong,et al.  Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network , 2015, ICML.

[5]  Jitendra Malik,et al.  Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[7]  Mathieu Aubry,et al.  Crafting a multi-task CNN for viewpoint estimation , 2016, BMVC.

[8]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[9]  Sebastian Nowozin,et al.  Deep Directional Statistics: Pose Estimation with Uncertainty Quantification , 2018, ECCV.

[10]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[12]  Gang Wang,et al.  Multi-modal Unsupervised Feature Learning for RGB-D Scene Labeling , 2014, ECCV.

[13]  J. Michael McCarthy,et al.  Introduction to theoretical kinematics , 1990 .

[14]  Lucas Beyer,et al.  Biternion Nets: Continuous Head Pose Regression from Discrete Training Labels , 2015, GCPR.

[15]  Abhinav Gupta,et al.  Marr Revisited: 2D-3D Alignment via Surface Normal Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Renjie Liao,et al.  GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Klaudia Frankfurter,et al.  Symmetries And Laplacians Introduction To Harmonic Analysis Group Representations And Applications , 2016 .

[20]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[21]  Ayellet Tal,et al.  Viewpoint Estimation-Insights & Model , 2018, ECCV 2018.

[22]  Ian D. Reid,et al.  LieNet: Real-time Monocular Object Instance 6D Pose Estimation , 2018, BMVC.

[23]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ersin Yumer,et al.  Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  René Vidal,et al.  3D Pose Regression Using Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[26]  D. Henderson,et al.  Experiencing Geometry: On Plane and Sphere , 1995 .

[27]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Erik B. Dam,et al.  Quaternions, Interpolation and Animation , 2000 .

[29]  Nicola De Cao,et al.  Explorations in Homeomorphic Variational Auto-Encoding , 2018, ArXiv.

[30]  Ken Shoemake,et al.  Animating rotation with quaternion curves , 1985, SIGGRAPH.

[31]  François Fleuret,et al.  Improving Object Classification using Pose Information , 2012 .

[32]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Vincent Lepetit,et al.  3D Pose Estimation and 3D Model Retrieval for Objects in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  René Vidal,et al.  A Mixed Classification-Regression Framework for 3D Pose Estimation from 2D Images , 2018, BMVC.

[35]  Martial Hebert,et al.  Data-Driven 3D Primitives for Single Image Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[36]  Roger W. Brockett,et al.  Robotic manipulators and the product of exponentials formula , 1984 .

[37]  Marc Pollefeys,et al.  Discriminatively Trained Dense Surface Normal Estimation , 2014, ECCV.

[38]  William Rowan Hamilton,et al.  ON QUATERNIONS, OR ON A NEW SYSTEM OF IMAGINARIES IN ALGEBRA , 1847 .