One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-Off in Machine Learning Cloud Service APIs via Tolerance Tiers
暂无分享,去创建一个
Behzad Boroujerdian | Vijay Janapa Reddi | Matthew Halpern | Evelyn Duesterwald | Todd Mummert | Todd W. Mummert | V. Reddi | E. Duesterwald | Matthew Halpern | Behzad Boroujerdian
[1] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[2] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[3] Christina Delimitrou,et al. Tarcil: reconciling scheduling speed and quality in large shared clusters , 2015, SoCC.
[4] Christina Delimitrou,et al. HCloud: Resource-Efficient Provisioning in Shared Cloud Systems , 2016, ASPLOS.
[5] Gu-Yeon Wei,et al. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[6] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[7] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[8] H. T. Kung,et al. BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).
[9] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[10] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[11] Tara N. Sainath,et al. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Michael I. Jordan,et al. The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox , 2014, CIDR.
[13] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[14] Scott A. Mahlke,et al. Rumba: An online quality management system for approximate computing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[15] Benjamin C. Lee,et al. Market mechanisms for managing datacenters with heterogeneous microarchitectures , 2014, TOCS.
[16] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.
[17] Kaushik Roy,et al. Conditional Deep Learning for energy-efficient and enhanced pattern recognition , 2015, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[18] Thomas F. Wenisch,et al. The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services , 2014, OSDI.
[19] Jose-Maria Arnau,et al. An ultra low-power hardware accelerator for automatic speech recognition , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[20] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.
[21] Martin C. Rinard,et al. Chisel: reliability- and accuracy-aware optimization of approximate computational kernels , 2014, OOPSLA.
[22] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Alec Wolman,et al. MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints , 2016, MobiSys.
[24] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.
[25] Thomas F. Wenisch,et al. Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[26] Kevin Skadron,et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[27] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .
[28] Muli Ben-Yehuda,et al. Deconstructing Amazon EC2 Spot Instance Pricing , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.
[29] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Scott A. Mahlke,et al. Input responsiveness: using canary inputs to dynamically steer approximation , 2016, PLDI.
[32] Lingjia Tang,et al. Whare-map: heterogeneity in "homogeneous" warehouse-scale computers , 2013, ISCA.
[33] Christina Delimitrou,et al. Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.
[34] George Saon,et al. The IBM 2015 English conversational telephone speech recognition system , 2015, INTERSPEECH.
[35] Geoffrey Zweig,et al. Parallelizing WFST speech decoders , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Sungroh Yoon,et al. Big/little deep neural network for ultra low power inference , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[37] Geoffrey Zweig,et al. The microsoft 2016 conversational speech recognition system , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Hadi Esmaeilzadeh,et al. Towards Statistical Guarantees in Controlling Quality Tradeoffs for Approximate Acceleration , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[39] Xin Wang,et al. Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.
[40] Gu-Yeon Wei,et al. Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[41] Ronald G. Dreslinski,et al. Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.
[42] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.