Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving
暂无分享,去创建一个
Wei Wang | Bo Li | Jian He | Luping Wang | Xianchao Sun | Liping Zhang | Yinghao Yu | Lingyun Yang | Lingyun Yang | Bo Li | Wen Wang | Liping Zhang | Jian He | Yinghao Yu | Luping Wang | Xianchao Sun
[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[2] Peter I. Frazier,et al. A Tutorial on Bayesian Optimization , 2018, ArXiv.
[3] Pat Hanrahan,et al. Scanner: Efficient Video Analysis at Scale , 2018, ACM Trans. Graph..
[4] Marco Maggioni,et al. Dissecting the NVidia Turing T4 GPU via Microbenchmarking , 2019, ArXiv.
[5] Deepak Agarwal,et al. LASER: a scalable response prediction platform for online advertising , 2014, WSDM.
[6] Nan Hua,et al. Universal Sentence Encoder , 2018, ArXiv.
[7] Krzysztof Rzadca,et al. Autopilot: workload autoscaling at Google , 2020, EuroSys.
[8] Minlan Yu,et al. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.
[9] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[10] David M. Brooks,et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[11] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Beng Chin Ooi,et al. Rafiki: Machine Learning as an Analytics Service System , 2018, Proc. VLDB Endow..
[13] Ion Stoica,et al. Chameleon: scalable adaptation of video analytics , 2018, SIGCOMM.
[14] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[15] Olatunji Ruwase,et al. SERF: Efficient Scheduling for Fast Deep Neural Network Serving via Judicious Parallelism , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[16] Mosharaf Chowdhury,et al. Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications , 2019, MLSys.
[17] Tim Menzies,et al. Arrow: Low-Level Augmented Bayesian Optimization for Finding the Best Cloud VM , 2017, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).
[18] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[19] Masashi Sugiyama,et al. Few-shot Domain Adaptation by Causal Mechanism Transfer , 2020, ICML.
[20] Ion Stoica,et al. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.
[21] Kian Hsiang Low,et al. Decentralized High-Dimensional Bayesian Optimization with Factor Graphs , 2017, AAAI.
[22] Randy H. Katz,et al. Selecting the best VM across multiple public clouds: a data-driven performance modeling approach , 2017, SoCC.
[23] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[24] Ymir Vigfusson,et al. Serving DNNs like Clockwork: Performance Predictability from the Bottom Up , 2020, OSDI.
[25] Martial Hebert,et al. Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.
[26] D. Sculley,et al. Google Vizier: A Service for Black-Box Optimization , 2017, KDD.
[27] Yongjun Park,et al. Improving GPU Multitasking Efficiency Using Dynamic Resource Sharing , 2019, IEEE Computer Architecture Letters.
[28] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[29] Tim Menzies,et al. Scout: An Experienced Guide to Find the Best Cloud Configuration , 2018, ArXiv.
[30] Wei Wang,et al. MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving , 2019, USENIX Annual Technical Conference.
[31] Matthias Seeger,et al. Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning , 2019, NeurIPS.
[32] Christoforos E. Kozyrakis,et al. INFaaS: Automated Model-less Inference Serving , 2021, USENIX Annual Technical Conference.
[33] InferLine: ML Inference Pipeline Composition Framework , 2018, ArXiv.
[34] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[35] Srikanth Kandula,et al. Jockey: guaranteed job latency in data parallel clusters , 2012, EuroSys '12.
[36] Xin Wang,et al. Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.
[37] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[38] Cheng Li,et al. High Dimensional Bayesian Optimization with Elastic Gaussian Process , 2017, ICML.
[39] Carole-Jean Wu,et al. MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance , 2020, IEEE Micro.
[40] Sameh Elnikety,et al. Swayam: distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency , 2017, Middleware.
[41] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[42] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Byung-Gon Chun,et al. PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems , 2018, OSDI.
[44] Yong Li,et al. MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters , 2022, NSDI.
[45] Nando de Freitas,et al. Bayesian Optimization in High Dimensions via Random Embeddings , 2013, IJCAI.
[46] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[47] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[48] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[49] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[50] Meng Cai,et al. Efficient One-Pass Decoding with NNLM for Speech Recognition , 2014, IEEE Signal Processing Letters.
[51] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[52] Jie Cheng,et al. CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..
[53] Ajay Jain,et al. Dynamic Space-Time Scheduling for GPU Inference , 2018, ArXiv.
[54] Foster J. Provost,et al. Scalable hands-free transfer learning for online advertising , 2014, KDD.
[55] Rami G. Melhem,et al. Quality of service support for fine-grained sharing on GPUs , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).