Data Life Aware Model Updating Strategy for Stream-based Online Deep Learning

Many deep learning applications deployed in dynamic environments change over time, in which the training models are supposed to be continuously updated with streaming data in order to guarantee better descriptions on data trends. However, most of the state-of-the-art learning frameworks support well in offline training methods while omitting online model updating strategies. In this work, we propose and implement iDlaLayer, a thin middleware layer on top of existing training frameworks that streamlines the support and implementation of online deep learning applications. In pursuit of good model quality as well as fast data incorporation, we design a Data Life Aware model updating strategy (DLA), which builds training data samples according to contributions of data from different life stages, and considers the training cost consumed in model updating. We evaluate iDlaLayer's performance through both simulations and experiments based on TensorflowOnSpark with three representative online learning workloads. Our experimental results demonstrate that iDlaLayer reduces the overall elapsed time of MNIST, Criteo and PageRank by 11.3%, 28.2% and 15.2% compared to the periodic update strategy, respectively. It further achieves an average 20% decrease in training cost and brings about 5 % improvement in model quality against the traditional continuous training method.

[1]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[2]  CARLOS A. GOMEZ-URIBE,et al.  The Netflix Recommender System , 2015, ACM Trans. Manag. Inf. Syst..

[3]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[4]  Yang Wang,et al.  BigDL: A Distributed Deep Learning Framework for Big Data , 2018, SoCC.

[5]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[6]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[7]  Ming Zhou,et al.  Coooolll: A Deep Learning System for Twitter Sentiment Classification , 2014, *SEMEVAL.

[8]  Dhabaleswar K. Panda,et al.  Cutting the Tail: Designing High Performance Message Brokers to Reduce Tail Latencies in Stream Processing , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[9]  Laura M. Haas,et al.  SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems , 2010, Proc. VLDB Endow..

[10]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[11]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[12]  Joaquin Quiñonero Candela,et al.  Practical Lessons from Predicting Clicks on Ads at Facebook , 2014, ADKDD'14.

[13]  Dazhao Cheng,et al.  Joint Optimization of MapReduce Scheduling and Network Policy in Hierarchical Clouds , 2018, ICPP.

[14]  Michael Isard,et al.  Differential Dataflow , 2013, CIDR.

[15]  Xiaobo Zhou,et al.  Performance Isolation of Data-Intensive Scale-out Applications in a Multi-tenant Cloud , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[18]  Michael I. Jordan,et al.  The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox , 2014, CIDR.

[19]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[20]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[21]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[22]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[23]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  Xin Wang,et al.  Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.