Automatic generation of high-performance quantized machine learning kernels
暂无分享,去创建一个
Thierry Moreau | Meghan Cowan | Luis Ceze | James Bornholt | Tianqi Chen | Tianqi Chen | T. Moreau | M. Cowan | L. Ceze | James Bornholt
[1] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.
[2] Emina Torlak,et al. A lightweight symbolic virtual machine for solver-aided host languages , 2014, PLDI.
[3] Amrita Mazumdar,et al. Exploring computation-communication tradeoffs in camera systems , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).
[4] Swagath Venkataramani,et al. Accurate and Efficient 2-bit Quantized Neural Networks , 2019, MLSys.
[5] H. Massalin. Superoptimizer: a look at the smallest program , 1987, ASPLOS.
[6] Vivienne Sze,et al. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2017, IEEE Journal of Solid-State Circuits.
[7] Thierry Moreau,et al. Learning to Optimize Tensor Programs , 2018, NeurIPS.
[8] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[9] Sanjit A. Seshia,et al. Combinatorial sketching for finite programs , 2006, ASPLOS XII.
[10] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[11] Sumit Gulwani,et al. Synthesis of loop-free programs , 2011, PLDI '11.
[12] Alastair David Reid. Who guards the guards? formal validation of the Arm v8-m architecture specification , 2017, Proc. ACM Program. Lang..
[13] Rastislav Bodík,et al. Chlorophyll : Synthesis-Aided Compiler for Low-Power Spatial Architectures by Phitchaya Mangpo Phothilimthana , 2015 .
[14] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Emina Torlak,et al. Growing solver-aided languages with rosette , 2013, Onward!.
[17] Emina Torlak,et al. Optimizing synthesis with metasketches , 2016, POPL.
[18] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[19] Philip Heng Wai Leong,et al. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.
[20] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..
[21] Thierry Moreau,et al. Automating Generation of Low Precision Deep Learning Operators , 2018, ArXiv.
[22] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[23] Jian Sun,et al. Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..
[25] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[26] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[27] Hadi Esmaeilzadeh,et al. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[28] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.
[29] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.
[30] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[31] Bertrand A. Maher,et al. Glow: Graph Lowering Compiler Techniques for Neural Networks , 2018, ArXiv.
[32] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[33] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[34] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Thierry Moreau,et al. A Hardware–Software Blueprint for Flexible Deep Learning Specialization , 2018, IEEE Micro.
[36] Magnus Jahre,et al. Towards efficient quantized neural network inference on mobile devices: work-in-progress , 2017, CASES.
[37] Magnus Själander,et al. BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[38] Hari Angepat,et al. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , 2018, IEEE Micro.
[39] Jiangming Jin,et al. BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[40] Yangqing Jia,et al. High performance ultra-low-precision convolutions on mobile devices , 2017, ArXiv.
[41] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.
[42] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.