End to End Binarized Neural Networks for Text Classification

Deep neural networks have demonstrated their superior performance in almost every Natural Language Processing task, however, their increasing complexity raises concerns. In particular, these networks require high expenses on computational hardware, and training budget is a concern for many. Even for a trained network, the inference phase can be too demanding for resource-constrained devices, thus limiting its applicability. The state-of-the-art transformer models are a vivid example. Simplifying the computations performed by a network is one way of relaxing the complexity requirements. In this paper, we propose an end to end binarized neural network architecture for the intent classification task. In order to fully utilize the potential of end to end binarization, both input representations (vector embeddings of tokens statistics) and the classifier are binarized. We demonstrate the efficiency of such architecture on the intent classification of short texts over three datasets and for text classification with a larger dataset. The proposed architecture achieves comparable to the state-of-the-art results on standard intent classification datasets while utilizing ~ 20-40% lesser memory and training time. Furthermore, the individual components of the architecture, such as binarized vector embeddings of documents or binarized classifiers, can be used separately with not necessarily fully binary architectures.

[1]  Friedrich T. Sommer,et al.  Variable Binding for Sparse Distributed Representations: Theory and Applications , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Evgeny Osipov,et al.  Density Encoding Enables Resource-Efficient Randomly Connected Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[4]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[5]  Marcus Liwicki,et al.  HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing enabled Embedding of n-gram Statistics , 2020, ArXiv.

[6]  Sorin Grigorescu,et al.  A Survey of Deep Learning Techniques for Autonomous Driving , 2020, J. Field Robotics.

[7]  P. Kanerva,et al.  Hyperdimensional Computing for Text Classification , 2016 .

[8]  Valeriy Vyatkin,et al.  Distributed Representation of n-gram Statistics for Boosting Self-organizing Maps with Hyperdimensional Computing , 2019, Ershov Informatics Conference.

[9]  Jack Xin,et al.  Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , 2019, ICLR.

[10]  Aditya Joshi,et al.  Language Geometry Using Random Indexing , 2016, QI.

[11]  Evgeny Osipov,et al.  Integer Echo State Networks: Hyperdimensional Reservoir Computing , 2017, ArXiv.

[12]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Jan M. Rabaey,et al.  A Robust and Energy-Efficient Classifier Using Brain-Inspired Hyperdimensional Computing , 2016, ISLPED.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Dmitri A. Rachkovskij,et al.  Representation and Processing of Structures with Binary Sparse Distributed Codes , 2001, IEEE Trans. Knowl. Data Eng..

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[19]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[20]  Marcus Liwicki,et al.  Subword Semantic Hashing for Intent Classification on Small Datasets , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[21]  Alexander Legalov,et al.  Associative synthesis of finite state automata model of a controlled object with hyperdimensional computing , 2017, IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society.

[22]  Jan M. Rabaey,et al.  High-Dimensional Computing as a Nanoscalable Paradigm , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[23]  Taku Kudo,et al.  Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[24]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[25]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[26]  Adrian Hernandez-Mendez,et al.  Evaluating Natural Language Understanding Services for Conversational Question Answering Systems , 2017, SIGDIAL Conference.

[27]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[28]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[29]  Oren Etzioni,et al.  Green AI , 2019, Commun. ACM.

[30]  Pentti Kanerva,et al.  Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors , 2009, Cognitive Computation.

[31]  Lukas Geiger,et al.  Larq: An Open-Source Library for Training Binarized Neural Networks , 2020, J. Open Source Softw..

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[34]  Gilles Louppe,et al.  Independent consultant , 2013 .

[35]  Friedrich T. Sommer,et al.  A Theory of Sequence Indexing and Working Memory in Recurrent Neural Networks , 2018, Neural Computation.