论文信息 - Adaptive Pruning of Neural Language Models for Mobile Devices

Adaptive Pruning of Neural Language Models for Mobile Devices

Neural language models (NLMs) exist in an accuracy-efficiency tradeoff space where better perplexity typically comes at the cost of greater computation complexity. In a software keyboard application on mobile devices, this translates into higher power consumption and shorter battery life. This paper represents the first attempt, to our knowledge, in exploring accuracy-efficiency tradeoffs for NLMs. Building on quasi-recurrent neural networks (QRNNs), we apply pruning techniques to provide a "knob" to select different operating points. In addition, we propose a simple technique to recover some perplexity using a negligible amount of memory. Our empirical evaluations consider both perplexity as well as energy consumption on a Raspberry Pi, where we demonstrate which methods provide the best perplexity-power consumption operating point. At one operating point, one of the techniques is able to provide energy savings of 40% over the state of the art with only a 17% relative increase in perplexity.

Jimmy Lin | Raphael Tang

[1] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Balaraman Ravindran,et al. Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[4] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[5] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[6] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[7] Eugenio Culurciello,et al. An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[8] Richard Socher,et al. An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.

[9] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[10] Carlo Meghini,et al. Deep learning for decentralized parking lot occupancy detection , 2017, Expert Syst. Appl..

[11] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.