论文信息 - On the Continuity of Rotation Representations in Neural Networks

On the Continuity of Rotation Representations in Neural Networks

In neural networks, it is often desirable to work with various representations of the same space. For example, 3D rotations can be represented with quaternions or Euler angles. In this paper, we advance a definition of a continuous representation, which can be helpful for training deep neural networks. We relate this to topological concepts such as homeomorphism and embedding. We then investigate what are continuous and discontinuous representations for 2D, 3D, and n-dimensional rotations. We demonstrate that for 3D rotations, all representations are discontinuous in the real Euclidean spaces of four or fewer dimensions. Thus, widely used representations such as quaternions and Euler angles are discontinuous and difficult for neural networks to learn. We show that the 3D rotations have continuous representations in 5D and 6D, which are more suitable for learning. We also present continuous representations for the general case of the n-dimensional rotation group SO(n). While our main focus is on rotations, we also show that our constructions apply to other groups such as the orthogonal group and similarity transforms. We finally present empirical results, which show that our continuous rotation representations outperform discontinuous ones for several practical problems in graphics and vision, including a simple autoencoder sanity test, a rotation estimator for 3D point clouds, and an inverse kinematics solver for 3D human poses.

[1] P. Erdös,et al. Interpolation , 1953, An Introduction to Scientific, Symbolic, and Graphical Computation.

[2] H. Hopf. Systeme symmetrischer Bilinearformen und euklidische Modelle der projektiven Räume , 1964 .

[3] David M. Bloom,et al. Linear Algebra and Geometry , 1979 .

[4] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[5] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[6] D. M. Davis. Embeddings of real projective spaces , 1998 .

[7] F. Sebastian Grassia,et al. Practical Parameterization of Rotations Using the Exponential Map , 1998, J. Graphics, GPU, & Game Tools.

[8] B. K. Lahiri. A First Course in Algebraic Topology , 2000 .

[9] Zongben Xu,et al. Simultaneous Lp-approximation order for neural networks , 2005, Neural Networks.

[10] Sagrario Lantarón,et al. Constructive Approximation of Discontinuous Functions by Neural Networks , 2008, Neural Processing Letters.

[11] Zongben Xu,et al. The essential order of approximation for neural networks , 2004, Science in China Series F: Information Sciences.

[12] Ashutosh Saxena,et al. Learning 3-D object orientation from images , 2009, 2009 IEEE International Conference on Robotics and Automation.

[13] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[14] C. Angulo,et al. Uniform sampling of rotations for discrete and continuous learning of 2D shape models , 2012 .

[15] Zhixiang Chen,et al. The construction and approximation of neural networks operators with Gaussian activation function , 2013 .

[16] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[17] Roberto Cipolla,et al. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[19] Wei Zhang,et al. Deep Kinematic Pose Regression , 2016, ECCV Workshops.

[20] Gorjan Alagic,et al. #p , 2019, Quantum information & computation.

[21] Thomas Brox,et al. DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Jan Eilers,et al. On solving the inverse kinematics problem using neural networks , 2017, 2017 24th International Conference on Mechatronics and Machine Vision in Practice (M2VIP).

[23] Roberto Cipolla,et al. Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Dieter Fox,et al. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[26] Nicola De Cao,et al. Explorations in Homeomorphic Variational Auto-Encoding , 2018, ArXiv.

[27] Jianwei Zhang,et al. Occlusion Resistant Object Rotation Regression from Point Cloud Segments , 2018, ECCV Workshops.

[28] Ian D. Reid,et al. Deep-6DPose: Recovering 6D Object Pose from a Single RGB Image , 2018, ArXiv.

[29] Ruben Villegas,et al. Neural Kinematic Networks for Unsupervised Motion Retargetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30] Jitendra Malik,et al. End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31] Sheng Wan,et al. QuatNet: Quaternion-Based Head Pose Estimation With Multiregression Loss , 2019, IEEE Transactions on Multimedia.