Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters

Recent works have demonstrated reasonable success of representation learning in hypercomplex space. Specifically, “fully-connected layers with Quaternions” (4D hypercomplex numbers), which replace real-valued matrix multiplications in fully-connected layers with Hamilton products of Quaternions, both enjoy parameter savings with only 1/4 learnable parameters and achieve comparable performance in various applications. However, one key caveat is that hypercomplex space only exists at very few predefined dimensions (4D, 8D, and 16D). This restricts the flexibility of models that leverage hypercomplex multiplications. To this end, we propose parameterizing hypercomplex multiplications, allowing models to learn multiplication rules from data regardless of whether such rules are predefined. As a result, our method not only subsumes the Hamilton product, but also learns to operate on any arbitrary nD hypercomplex space, providing more architectural flexibility using arbitrarily 1/n learnable parameters compared with the fully-connected layer counterpart. Experiments of applications to the LSTM and Transformer models on natural language inference, machine translation, text style transfer, and subject verb agreement demonstrate architectural flexibility and effectiveness of the proposed approach.

[1]  Yi Xu,et al.  Quaternion Convolutional Neural Networks , 2018, ECCV.

[2]  Peter Clark,et al.  SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Julianna M Czum,et al.  Dive Into Deep Learning. , 2020, Journal of the American College of Radiology : JACR.

[5]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[6]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[7]  Yorick Wilks,et al.  Natural language inference. , 1973 .

[8]  Anthony S. Maida,et al.  Deep Quaternion Networks , 2017, 2018 International Joint Conference on Neural Networks (IJCNN).

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Siu Cheung Hui,et al.  Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks , 2019, ACL.

[12]  Titouan Parcollet,et al.  Quaternion Recurrent Neural Networks , 2018, ICLR.

[13]  Zhiguo Wang,et al.  Bilateral Multi-Perspective Matching for Natural Language Sentences , 2017, IJCAI.

[14]  Jason Weston,et al.  Dialogue Natural Language Inference , 2018, ACL.

[15]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[16]  Ying Zhang,et al.  Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition , 2018, INTERSPEECH.

[17]  Adityan Rishiyur Neural Networks with Complex and Quaternion Inputs , 2006, ArXiv.

[18]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Titouan Parcollet,et al.  Quaternion Convolutional Neural Networks for Heterogeneous Image Processing , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).