Automatic and Efficient Customization of Neural Networks for ML Applications

ML APIs have greatly relieved application developers of the burden to design and train their own neural network models -- classifying objects in an image can now be as simple as one line of Python code to call an API. However, these APIs offer the same pre-trained models regardless of how their output is used by different applications. This can be suboptimal as not all ML inference errors can cause application failures, and the distinction between inference errors that can or cannot cause failures varies greatly across applications. To tackle this problem, we first study 77 real-world applications, which collectively use six ML APIs from two providers, to reveal common patterns of how ML API output affects applications' decision processes. Inspired by the findings, we propose ChameleonAPI, an optimization framework for ML APIs, which takes effect without changing the application source code. ChameleonAPI provides application developers with a parser that automatically analyzes the application to produce an abstract of its decision process, which is then used to devise an application-specific loss function that only penalizes API output errors critical to the application. ChameleonAPI uses the loss function to efficiently train a neural network model customized for each application and deploys it to serve API invocations from the respective application via existing interface. Compared to a baseline that selects the best-of-all commercial ML API, we show that ChameleonAPI reduces incorrect application decisions by 43%.

[1]  Madhusanka Liyanage,et al.  A Survey on the Convergence of Edge Computing and AI for UAVs: Opportunities and Challenges , 2022, IEEE Internet of Things Journal.

[2]  Nanqing Dong,et al.  Edge Computing with Artificial Intelligence: A Machine Learning Perspective , 2022, ACM Comput. Surv..

[3]  Hyesoon Kim,et al.  FiGO: Fine-Grained Query Optimization in Video Analytics , 2022, SIGMOD Conference.

[4]  M. Maire,et al.  Automated Testing of Software that Uses Machine Learning APIs , 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE).

[5]  Yuan Xue,et al.  Cost Effective MLaaS Federation: A Combinatorial Reinforcement Learning Approach , 2022, IEEE INFOCOM 2022 - IEEE Conference on Computer Communications.

[6]  G. Ananthanarayanan,et al.  GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge , 2022, NSDI.

[7]  Matei Zaharia,et al.  Did the Model Change? Efficiently Assessing Machine Learning API Shifts , 2021, ArXiv.

[8]  Xingxing Xiong,et al.  A Survey of Recent Advances in Edge-Computing-Powered Artificial Intelligence of Things , 2021, IEEE Internet of Things Journal.

[9]  Dong-Wan Choi,et al.  Pool of Experts: Realtime Querying Specialized Knowledge in Massive Neural Networks , 2021, SIGMOD Conference.

[10]  Shan Lu,et al.  Are Machine Learning Cloud APIs Used Correctly? , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[11]  Ramyad Hadidi,et al.  THIA: Accelerating Video Analytics using Early Inference and Fine-Grained Query Planning , 2021, ArXiv.

[12]  Joseph Gonzalez,et al.  InferLine: latency-aware provisioning and scaling for prediction serving pipelines , 2020, SoCC.

[13]  Noah A. Smith,et al.  The Multilingual Amazon Reviews Corpus , 2020, EMNLP.

[14]  Emanuel Ben Baruch,et al.  Asymmetric Loss For Multi-Label Classification , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Ymir Vigfusson,et al.  Serving DNNs like Clockwork: Performance Predictability from the Bottom Up , 2020, OSDI.

[16]  Matei Zaharia,et al.  FrugalML: How to Use ML Prediction APIs More Accurately and Cheaply , 2020, NeurIPS.

[17]  Qin Zheng,et al.  IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture , 2020, Comput. Networks.

[18]  Dan Jurafsky,et al.  Racial disparities in automated speech recognition , 2020, Proceedings of the National Academy of Sciences.

[19]  Nick Koudas,et al.  Video Monitoring Queries , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[20]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[21]  Jun Zhang,et al.  A Study of Data Pre-processing Techniques for Imbalanced Biomedical Data Classification , 2019, Int. J. Bioinform. Res. Appl..

[22]  Shivaram Venkataraman,et al.  Parity models: erasure-coded resilience for prediction serving systems , 2019, SOSP.

[23]  Haichen Shen,et al.  Nexus: a GPU cluster engine for accelerating DNN-based video analysis , 2019, SOSP.

[24]  T. Van Veen Wikidata , 2019, Information Technology and Libraries.

[25]  Jaewon Lee,et al.  MnnFast: A Fast and Scalable System Architecture for Memory-Augmented Neural Networks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[26]  Yuxiong He,et al.  GRNN: Low-Latency and Scalable RNN Inference on GPUs , 2019, EuroSys.

[27]  Mosharaf Chowdhury,et al.  Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications , 2019, MLSys.

[28]  Ravi Teja Mullapudi,et al.  Online Model Distillation for Efficient Video Inference , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Jordi Pont-Tuset,et al.  The Open Images Dataset V4 , 2018, International Journal of Computer Vision.

[30]  Byung-Gon Chun,et al.  PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems , 2018, OSDI.

[31]  Minjia Zhang,et al.  DeepCPU: Serving RNN-based Deep Learning Models 10x Faster , 2018, USENIX Annual Technical Conference.

[32]  Gregory R. Ganger,et al.  Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing , 2018, USENIX Annual Technical Conference.

[33]  Thomas F. Wenisch,et al.  Physical Representation-Based Predicate Optimization for a Visual Analytics Database , 2018, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[34]  Daniel Kang,et al.  BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics , 2018, Proc. VLDB Endow..

[35]  Janusz Wojtusiak,et al.  Recent advances in scaling‐down sampling methods in machine learning , 2017 .

[36]  Ben Y. Zhao,et al.  Complexity vs. performance: empirical analysis of machine learning as a service , 2017, Internet Measurement Conference.

[37]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[38]  Wolfram Burgard,et al.  Optimization Beyond the Convolution: Generalizing Spatial Relations with End-to-End Metric Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Radha Poovendran,et al.  Google's Cloud Vision API is Not Robust to Noise , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[40]  Xin Wang,et al.  Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.

[41]  Seungyeop Han,et al.  Fast Video Classification via Adaptive Cascading of Deep Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Alec Wolman,et al.  MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints , 2016, MobiSys.

[43]  Junmo Kim,et al.  Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Decision of the European Court of Justice 11 July 2013 – Ca C-52111 “Amazon” , 2013, IIC - International Review of Intellectual Property and Competition Law.

[46]  Prashant Pandey,et al.  Cloud computing , 2010, ICWET.

[47]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[48]  I. Stoica,et al.  SHEPHERD: Serving DNNs in the Wild , 2023, NSDI.

[49]  A. Polleres,et al.  An Analysis of Links in Wikidata , 2022, ESWC.

[50]  Youngjin Kwon,et al.  Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing , 2022, USENIX Annual Technical Conference.

[51]  Clark W. Barrett,et al.  cvc5: A Versatile and Industrial-Strength SMT Solver , 2022, TACAS.

[52]  Christoforos E. Kozyrakis,et al.  INFaaS: Automated Model-less Inference Serving , 2021, USENIX Annual Technical Conference.

[53]  Wei Wang,et al.  MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving , 2019, USENIX Annual Technical Conference.

[54]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[55]  Michael J. Cafarella,et al.  Predicate Optimization for a Visual Analytics Database , 2018 .

[56]  Thomas Ball,et al.  Deconstructing Dynamic Symbolic Execution , 2015, Dependable Software Systems Engineering.

[57]  方华 google,我,萨娜 , 2006 .

[58]  Rong Chen,et al.  This paper is included in the Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation. Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences , 2022 .

[59]  Janardhan Kulkarni,et al.  This paper is included in the Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation. Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters , 2022 .