Serving Deep Learning Models in a Serverless Platform

Serverless computing has emerged as a compelling paradigm for the development and deployment of a wide range of event based cloud applications. At the same time, cloud providers and enterprise companies are heavily adopting machine learning and Artificial Intelligence to either differentiate themselves, or provide their customers with value added services. In this work we evaluate the suitability of a serverless computing environment for the inferencing of large neural network models. Our experimental evaluations are executed on the AWS Lambda environment using the MxNet deep learning framework. Our experimental results show that while the inferencing latency can be within an acceptable range, longer delays due to cold starts can skew the latency distribution and hence risk violating more stringent SLAs.

[1]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[2]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[3]  Michael I. Jordan,et al.  The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox , 2014, CIDR.

[4]  Ian T. Foster,et al.  Ripple: Home Automation for Research Data Management , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[5]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Ali Kanso,et al.  Leveraging the Serverless Architecture for Securing Linux Containers , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[8]  Xin Wang,et al.  Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Castro Paul,et al.  Cloud-Native, Event-Based Programming for Mobile Applications , 2016 .

[11]  Perry Cheng,et al.  Serverless Computing: Current Trends and Open Problems , 2017, Research Advances in Cloud Computing.

[12]  Philippe Merle,et al.  Autonomic Vertical Elasticity of Docker Containers with ELASTICDOCKER , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[13]  Geoffrey C. Fox,et al.  Status of Serverless Computing and Function-as-a-Service(FaaS) in Industry and Research , 2017, ArXiv.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Paul R. Brenner,et al.  Serverless Computing: Design, Implementation, and Performance , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[16]  Perry Cheng,et al.  Building a Chatbot with Serverless Computing , 2016, MOTA@Middleware.

[17]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.