Deep Quaternion Networks

The field of deep learning has seen significant advancement in recent years. However, much of the existing work has been focused on real-valued numbers. Recent work has shown that a deep learning system using the complex numbers can be deeper for a fixed parameter budget compared to its real-valued counterpart. In this work, we explore the benefits of generalizing one step further into the hyper-complex numbers, quaternions specifically, and provide the architecture components needed to build deep quaternion networks. We develop the theoretical basis by reviewing quaternion convolutions, developing a novel quaternion weight initialization scheme, and developing novel algorithms for quaternion batch-normalization. These pieces are tested in a classification model by end-to-end training on the CIFAR −10 and CIFAR −100 data sets and a segmentation model by end-to-end training on the KITTI Road Segmentation data set. These quaternion networks show improved convergence compared to real-valued and complex-valued networks, especially on the segmentation task, while having fewer parameters.

[1]  Titouan Parcollet,et al.  Quaternion Neural Networks for Spoken Language Understanding , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[2]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[3]  Adityan Rishiyur,et al.  Neural Networks with Complex and Quaternion Inputs , 2006, ArXiv.

[4]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[5]  Lilong Shi,et al.  Quaternion color texture segmentation , 2007, Comput. Vis. Image Underst..

[6]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[7]  Jannik Fritsch,et al.  A new performance measure and evaluation benchmark for road detection algorithms , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[8]  Nobuyuki Matsui,et al.  Feed forward neural network with random quaternionic neurons , 2017, Signal Process..

[9]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[10]  Alex Graves,et al.  Associative Long Short-Term Memory , 2016, ICML.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Thomas Bülow,et al.  Hypercomplex signals-a novel extension of the analytic signal to the multidimensional case , 2001, IEEE Trans. Signal Process..

[13]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[14]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[15]  Les E. Atlas,et al.  Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.

[16]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[18]  Stephen J. Sangwine,et al.  Colour image filters based on hypercomplex convolution , 2000 .

[19]  Thomas Bülow,et al.  Hypercomplex spectral signal representations for the processing and analysis of images , 1999 .

[20]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[21]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[24]  Sandeep Subramanian,et al.  Deep Complex Networks , 2017, ICLR.

[25]  K. Strimmer,et al.  Optimal Whitening and Decorrelation , 2015, 1512.00809.

[26]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .