INFaaS: Automated Model-less Inference Serving
暂无分享,去创建一个
Christoforos E. Kozyrakis | Neeraja J. Yadwadkar | Francisco Romero | Qian Li | Francisco Romero | Qian Li | C. Kozyrakis | N. Yadwadkar | Christos Kozyrakis
[1] Amar Phanishayee,et al. Themis: Fair and Efficient GPU Cluster Scheduling , 2020, NSDI.
[2] Paramvir Bahl,et al. Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.
[3] Minlan Yu,et al. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.
[4] Christoforos E. Kozyrakis,et al. Pocket: Elastic Ephemeral Storage for Serverless Analytics , 2018, OSDI.
[5] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[6] Ricardo Bianchini,et al. Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider , 2020, USENIX Annual Technical Conference.
[7] Sameh Elnikety,et al. Model-Switching: Dealing with Fluctuating Workloads in Machine-Learning-as-a-Service Systems , 2020, HotCloud.
[8] Ion Stoica,et al. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.
[9] Cody Coleman,et al. MLPerf Inference Benchmark , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[10] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[11] Wencong Xiao,et al. Gandiva: Introspective Cluster Scheduling for Deep Learning , 2018, OSDI.
[12] Eric S. Chung,et al. A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[13] Paramvir Bahl,et al. Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.
[14] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .
[15] J. Gathen,et al. A bound on solutions of linear integer equalities and inequalities , 1978 .
[16] Michael Riley,et al. Semantic Lattice Processing in Contextual Automatic Speech Recognition for Google Assistant , 2018, INTERSPEECH.
[17] Ymir Vigfusson,et al. Serving DNNs like Clockwork: Performance Predictability from the Bottom Up , 2020, OSDI.
[18] Byung-Gon Chun,et al. PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems , 2018, OSDI.
[19] Ion Stoica,et al. Occupy the cloud: distributed computing for the 99% , 2017, SoCC.
[20] Behzad Boroujerdian,et al. One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-Off in Machine Learning Cloud Service APIs via Tolerance Tiers , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[21] Jie Wu,et al. Energy efficient virtual machine placement algorithm with balanced and improved resource utilization in a data center , 2013, Math. Comput. Model..
[22] Mor Harchol-Balter,et al. AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers , 2012, TOCS.
[23] William A. Wulf,et al. Policy/mechanism separation in Hydra , 1975, SOSP.
[24] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[25] David M. Brooks,et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[26] Hadi Esmaeilzadeh,et al. Shredder: Learning Noise Distributions to Protect Inference Privacy , 2020, ASPLOS.
[27] Mosharaf Chowdhury,et al. Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications , 2019, MLSys.
[28] Carole-Jean Wu,et al. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[29] M. Tamer Özsu,et al. ConfluxDB: Multi-Master Replication for Partitioned Snapshot Isolation Databases , 2014, Proc. VLDB Endow..
[30] Quan Quan,et al. A portable, automatic data qantizer for deep neural networks , 2018, PACT.
[31] Joseph Gonzalez,et al. InferLine: latency-aware provisioning and scaling for prediction serving pipelines , 2020, SoCC.
[32] Carlo Curino,et al. Hydra: a federated resource manager for data-center scale analytics , 2019, NSDI.
[33] Laura Kallmeyer,et al. Multilingual Multi-class Sentiment Classification Using Convolutional Neural Networks , 2018, LREC.
[34] Pat Hanrahan,et al. Scanner: Efficient Video Analysis at Scale , 2018, ACM Trans. Graph..
[35] Kang G. Shin,et al. Tiresias: A GPU Cluster Manager for Distributed Deep Learning , 2019, NSDI.
[36] Ingmar Weber,et al. Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.
[37] Wei Wang,et al. MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving , 2019, USENIX Annual Technical Conference.
[38] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.
[39] Matei Zaharia,et al. NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..
[40] Steven S. Seiden,et al. On the online bin packing problem , 2001, JACM.
[41] Ion Stoica,et al. Chameleon: scalable adaptation of video analytics , 2018, SIGCOMM.
[42] Haichen Shen,et al. Nexus: a GPU cluster engine for accelerating DNN-based video analysis , 2019, SOSP.
[43] Christina Delimitrou,et al. Mage: online and interference-aware scheduling for multi-scale heterogeneous systems , 2018, PACT.
[44] Srikanth Kandula,et al. Jockey: guaranteed job latency in data parallel clusters , 2012, EuroSys '12.
[45] Xin Wang,et al. Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.
[46] Jinjun Xiong,et al. TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments , 2018, ArXiv.
[47] Carole-Jean Wu,et al. DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[48] Ajay Jain,et al. Dynamic Space-Time Scheduling for GPU Inference , 2018, ArXiv.
[49] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.
[50] Sameh Elnikety,et al. Swayam: distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency , 2017, Middleware.
[51] Qian Li,et al. A Case for Managed and Model-less Inference Serving , 2019, HotOS.
[52] Paolo Napoletano,et al. Benchmark Analysis of Representative Deep Neural Network Architectures , 2018, IEEE Access.
[53] Krish Shankar,et al. Azure Machine Learning , 2019 .