DeQA: On-DeviceQuestion Answering

Today there is no effective support for device-wide question answering on mobile devices. State-of-the-art QAmodels are deep learning behemoths designed for the cloud which run extremely slow and require more memory than available on phones. We present DeQA, a suite of latencyand memoryoptimizations that adapts existing QA systems to run completely locally on mobile phones. Specifically, we design two latency optimizations that (1) stops processing documents if further processing cannot improve answer quality, and (2) identifies computation that does not depend on the question and moves it offline. These optimizations do not depend on the QA model internals and can be applied to several existing QA models. DeQA also implements a set of memory optimizations by (i) loading partial indexes in memory, (ii) working with smaller units of data, and (iii) replacing in-memory lookups with a key-value database. We use DeQA to port three state-of-the-art QA systems to the mobile device and evaluate over three datasets. The first is a large scale SQuAD dataset defined over Wikipedia collection. We also create two on-device QA datasets, one over a publicly available email data collection and the other using a cross-app data collection we obtain from two users. Our evaluations show that DeQA can run QA models with only a few hundred MBs of memory and provides at least 13x speedup on average on the mobile phone across all three datasets.

[1]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[2]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[3]  Susan T. Dumais,et al.  An Analysis of the AskMSR Question-Answering System , 2002, EMNLP.

[4]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[5]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[6]  Pavel Serdyukov,et al.  Enterprise and desktop search , 2010, WWW '10.

[7]  Srinivas Bangalore,et al.  Qme! : A Speech-based Question-Answering system on Mobile Devices , 2010, HLT-NAACL.

[8]  Niranjan Balasubramanian,et al.  FindAll: a local search engine for mobile phones , 2012, CoNEXT '12.

[9]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[10]  Ronald G. Dreslinski,et al.  Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.

[11]  Petr Baudis,et al.  Modeling of the Question Answering Task in the YodaQA System , 2015, CLEF.

[12]  Nicholas D. Lane,et al.  DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learning , 2015, UbiComp.

[13]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[14]  Nicholas D. Lane,et al.  Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables , 2016, SenSys.

[15]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[16]  Soheil Ghiasi,et al.  CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android , 2015, ACM Multimedia.

[17]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[18]  Phil Blunsom,et al.  Optimizing Performance of Recurrent Neural Networks on GPUs , 2016, ArXiv.

[19]  Alec Wolman,et al.  MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints , 2016, MobiSys.

[20]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[21]  Nicholas D. Lane,et al.  DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[22]  Harsha V. Madhyastha,et al.  Vroom: Accelerating the Mobile Web with Server-Aided Dependency Resolution , 2017, SIGCOMM.

[23]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[24]  Deng Cai,et al.  MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension , 2017, ArXiv.

[25]  Rainer Stiefelhagen,et al.  Using Technology Developed for Autonomous Cars to Help Navigate Blind People , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[26]  Niranjan Balasubramanian,et al.  MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU , 2017, EMDL '17.

[27]  Dirk Weissenborn,et al.  Making Neural QA as Simple as Possible but not Simpler , 2017, CoNLL.

[28]  Xiao Zeng,et al.  MobileDeepPill: A Small-Footprint Mobile Deep Learning System for Recognizing Unconstrained Pill Images , 2017, MobiSys.

[29]  Ming Zhou,et al.  Gated Self-Matching Networks for Reading Comprehension and Question Answering , 2017, ACL.

[30]  Yi Cao,et al.  Deconstructing the Energy Consumption of the Mobile Page Load , 2017, SIGMETRICS.

[31]  Ting Liu,et al.  Attention-over-Attention Neural Networks for Reading Comprehension , 2016, ACL.

[32]  Rajesh Krishna Balan,et al.  DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications , 2017, MobiSys.

[33]  Richard Socher,et al.  Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[34]  Seunghak Yu,et al.  A Multi-Stage Memory Augmented Neural Network for Machine Reading Comprehension , 2018, QA@ACL.

[35]  Wei Wang,et al.  Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering , 2018, ACL.

[36]  Christopher Clark,et al.  Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[37]  Jonathan Berant,et al.  Contextualized Word Representations for Reading Comprehension , 2017, NAACL.

[38]  Xiaodong Liu,et al.  Stochastic Answer Networks for Machine Reading Comprehension , 2017, ACL.

[39]  Quoc V. Le,et al.  QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[40]  Vitaly Shmatikov,et al.  Chiron: Privacy-preserving Machine Learning as a Service , 2018, ArXiv.

[41]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[42]  Ming Zhou,et al.  Reinforced Mnemonic Reader for Machine Reading Comprehension , 2017, IJCAI.

[43]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.