Auto-Split: A General Framework of Collaborative Edge-Cloud AI
暂无分享,去创建一个
Jian Pei | Lanjun Wang | Amin Banitalebi-Dehkordi | Naveen Vedula | Yong Zhang | Fei Xia | J. Pei | Amin Banitalebi-Dehkordi | Yong Zhang | Lanjun Wang | Fei Xia | Naveen Vedula
[1] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[2] H. T. Kung,et al. BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).
[3] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[4] Zhiru Zhang,et al. Improving Neural Network Quantization without Retraining using Outlier Channel Splitting , 2019, ICML.
[5] Jianxin Wu,et al. Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[6] Vivienne Sze,et al. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[7] Daniel Soudry,et al. Post training 4-bit quantization of convolutional networks for rapid-deployment , 2018, NeurIPS.
[8] Alexander S. Ecker,et al. Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , 2019, ArXiv.
[9] Steven K. Esser,et al. Learned Step Size Quantization , 2019, ICLR.
[10] Yujeong Choi,et al. PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[11] Zhi Zhou,et al. Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing , 2019, IEEE Transactions on Wireless Communications.
[12] Yuandong Tian,et al. Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search , 2018, ArXiv.
[13] Niraj K. Jha,et al. ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Bernd Girod,et al. Towards Effective 2-bit Quantization: Pareto-optimal Bit Allocation for Deep CNNs Compression , 2019 .
[15] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[16] Guy Jacob,et al. Neural Network Distiller: A Python Package For DNN Compression Research , 2019, ArXiv.
[17] Bo Chen,et al. NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.
[18] Kaisheng Ma,et al. SCAN: A Scalable Neural Networks Framework Towards Compact and Efficient Models , 2019, NeurIPS.
[19] Swagath Venkataramani,et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.
[20] G. Hua,et al. LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.
[21] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[22] Brucek Khailany,et al. Timeloop: A Systematic Approach to DNN Accelerator Evaluation , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[23] Dan Wang,et al. Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.
[24] Kurt Keutzer,et al. HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[25] Dan Alistarh,et al. Model compression via distillation and quantization , 2018, ICLR.
[26] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[27] Kurt Keutzer,et al. ZeroQ: A Novel Zero Shot Quantization Framework , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Yair Shoham,et al. Efficient bit allocation for an arbitrary set of quantizers [speech coding] , 1988, IEEE Trans. Acoust. Speech Signal Process..
[29] Asit K. Mishra,et al. Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.
[30] Luca Benini,et al. PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors , 2019, Philosophical Transactions of the Royal Society A.
[31] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.
[32] Lothar Thiele,et al. Rethinking Pruning for Accelerating Deep Inference At the Edge , 2020, KDD.
[33] Ji Liu,et al. Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking , 2018, ICLR.
[34] Shigeng Zhang,et al. Towards Real-time Cooperative Deep Inference over the Cloud and Edge End Devices , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..
[35] Luca Benini,et al. Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers , 2019, MLSys.
[36] Philip S. Yu,et al. Not Just Privacy: Improving Performance of Private Deep Learning in Mobile Cloud , 2018, KDD.
[37] Tao Zhang,et al. A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.
[38] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.
[39] Xiangyu Zhang,et al. Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[40] Raghuraman Krishnamoorthi,et al. Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.
[41] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Avi Mendelson,et al. Loss Aware Post-training Quantization , 2019, ArXiv.
[43] Matthew Mattina,et al. SCALE-Sim: Systolic CNN Accelerator , 2018, ArXiv.
[44] Zhijian Liu,et al. HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Yoni Choukroun,et al. Low-bit Quantization of Neural Networks for Efficient Inference , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[46] Ilias Leontiadis,et al. SPINN: synergistic progressive inference of neural networks over device and cloud , 2020, MobiCom.
[47] Kilian Q. Weinberger,et al. Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.
[49] Elad Hoffer,et al. ACIQ: Analytical Clipping for Integer Quantization of neural networks , 2018, ArXiv.
[50] Trevor N. Mudge,et al. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.