Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks

Many state-of-the-art neural models for NLP are heavily parameterized and thus memory inefficient. This paper proposes a series of lightweight and memory efficient neural architectures for a potpourri of natural language processing (NLP) tasks. To this end, our models exploit computation using Quaternion algebra and hypercomplex spaces, enabling not only expressive inter-component interactions but also significantly ($75\%$) reduced parameter size due to lesser degrees of freedom in the Hamilton product. We propose Quaternion variants of models, giving rise to new architectures such as the Quaternion attention Model and Quaternion Transformer. Extensive experiments on a battery of NLP tasks demonstrates the utility of proposed Quaternion-inspired models, enabling up to $75\%$ reduction in parameter size without significant loss in performance.

[1]  Siu Cheung Hui,et al.  Hermitian Co-Attention Networks for Text Matching in Asymmetrical Domains , 2018, IJCAI.

[2]  Daniel Jurafsky,et al.  Do Multi-Sense Embeddings Improve Natural Language Understanding? , 2015, EMNLP.

[3]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[4]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5]  Sandeep Subramanian,et al.  Deep Complex Networks , 2017, ICLR.

[6]  Siu Cheung Hui,et al.  Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture , 2017, SIGIR.

[7]  Tim Rocktäschel,et al.  Frustratingly Short Attention Spans in Neural Language Modeling , 2017, ICLR.

[8]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[9]  Hua He,et al.  A Continuously Growing Dataset of Sentential Paraphrases , 2017, EMNLP.

[10]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[11]  Alex Graves,et al.  Associative Long Short-Term Memory , 2016, ICML.

[12]  Danilo Comminiello,et al.  Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[14]  Titouan Parcollet,et al.  Quaternion Recurrent Neural Networks , 2018, ICLR.

[15]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[16]  Yi Xu,et al.  Quaternion Convolutional Neural Networks , 2018, ECCV.

[17]  Peter Clark,et al.  SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[20]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[21]  Artit Wangperawong Attending to Mathematical Language with Transformers , 2018, ArXiv.

[22]  Zhiguo Wang,et al.  Bilateral Multi-Perspective Matching for Natural Language Sentences , 2017, IJCAI.

[23]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[24]  Tony Plate,et al.  Holographic Reduced Representations: Convolution Algebra for Compositional Distributed Representations , 1991, IJCAI.

[25]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[26]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[27]  Lina Yao,et al.  Quaternion Knowledge Graph Embeddings , 2019, NeurIPS.

[28]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[29]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[30]  Lorenzo Rosasco,et al.  Holographic Embeddings of Knowledge Graphs , 2015, AAAI.

[31]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[32]  Anthony S. Maida,et al.  Deep Quaternion Networks , 2017, 2018 International Joint Conference on Neural Networks (IJCNN).

[33]  Siu Cheung Hui,et al.  Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference , 2017, EMNLP.

[34]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[35]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[36]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[37]  Rada Mihalcea,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Langu , 2011, ACL 2011.

[38]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[39]  Dario Pavllo,et al.  QuaterNet: A Quaternion-based Recurrent Model for Human Motion , 2018, BMVC.

[40]  Lina Yao,et al.  Quaternion Collaborative Filtering for Recommendation , 2019, IJCAI.

[41]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[42]  Ying Zhang,et al.  Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition , 2018, INTERSPEECH.

[43]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[44]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.