论文信息 - ApproxHPVM: a portable compiler IR for accuracy-aware optimizations

ApproxHPVM: a portable compiler IR for accuracy-aware optimizations

We propose ApproxHPVM, a compiler IR and system designed to enable accuracy-aware performance and energy tuning on heterogeneous systems with multiple compute units and approximation methods. ApproxHPVM automatically translates end-to-end application-level quality metrics into accuracy requirements for individual operations. ApproxHPVM uses a hardware-agnostic accuracy-tuning phase to do this translation that provides greater portability across heterogeneous hardware platforms and enables future capabilities like accuracy-aware dynamic scheduling and design space exploration. ApproxHPVM incorporates three main components: (a) a compiler IR with hardware-agnostic approximation metrics, (b) a hardware-agnostic accuracy-tuning phase to identify error-tolerant computations, and (c) an accuracy-aware hardware scheduler that maps error-tolerant computations to approximate hardware components. As ApproxHPVM does not incorporate any hardware-specific knowledge as part of the IR, it can serve as a portable virtual ISA that can be shipped to all kinds of hardware platforms. We evaluate our framework on nine benchmarks from the deep learning domain and five image processing benchmarks. Our results show that our framework can offload chunks of approximable computations to special-purpose accelerators that provide significant gains in performance and energy, while staying within user-specified application-level quality metrics with high probability. Across the 14 benchmarks, we observe from 1-9x performance speedups and 1.1-11.3x energy reduction for very small reductions in accuracy.

[1] Anand Raghunathan,et al. Best-effort parallel execution framework for Recognition and mining applications , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[2] Scott A. Mahlke,et al. D2MA: Accelerating coarse-grained data transfer for GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[3] Dan Grossman,et al. Probability type inference for flexible approximate programming , 2015, OOPSLA.

[4] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5] Ben Sander. HSAIL: Portable compiler IR for HSA , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).

[6] Sujan Kumar Gonugondla,et al. PROMISE: An End-to-End Design of a Programmable Mixed-Signal Accelerator for Machine-Learning Algorithms , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[7] Charbel Sakr,et al. Analytical Guarantees on Numerical Precision of Deep Neural Networks , 2017, ICML.

[8] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[9] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[10] Martin C. Rinard. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks , 2006, ICS '06.

[11] Sarita V. Adve,et al. Stash: Have your scratchpad and cache it too , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[12] Rakesh Kumar,et al. VideoChef: Efficient Approximation for Streaming Video Processing Pipelines , 2018, USENIX Annual Technical Conference.

[13] Luis Ceze,et al. General-purpose code acceleration with limited-precision analog computation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[14] Jianfei Cai,et al. Robust Transmission of JPEG2000 Encoded Images Over Packet Loss Channels , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[15] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[16] Kalyan Veeramachaneni,et al. Autotuning algorithmic choice for input sensitivity , 2015, PLDI.

[17] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[18] Thu D. Nguyen,et al. ApproxHadoop: Bringing Approximations to MapReduce Frameworks , 2015, ASPLOS.

[19] Sarita V. Adve,et al. HPVM: heterogeneous parallel virtual machine , 2018, PPoPP.

[20] Weng-Fai Wong,et al. Exploiting half precision arithmetic in Nvidia GPUs , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[21] Gu-Yeon Wei,et al. HELIX-UP: Relaxing program semantics to unleash parallelization , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[22] Sachin S. Talathi,et al. Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.

[23] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[25] Martin C. Rinard,et al. Chisel: reliability- and accuracy-aware optimization of approximate computational kernels , 2014, OOPSLA.

[26] Dong Han,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[27] Michael G. Strintzis,et al. Optimized transmission of JPEG2000 streams over wireless channels , 2006, IEEE Transactions on Image Processing.

[28] Surendra Byna,et al. Exploiting the forgiving nature of applications for scalable parallel execution , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[29] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[30] Martin C. Rinard,et al. Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.

[31] Daniel M. Roy,et al. Probabilistically Accurate Program Transformations , 2011, SAS.

[32] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[33] Alexander Aiken,et al. Stochastic optimization of floating-point programs with tunable precision , 2014, PLDI.

[34] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[35] Sujan Kumar Gonugondla,et al. A Variation-Tolerant In-Memory Machine Learning Classifier via On-Chip Training , 2018, IEEE Journal of Solid-State Circuits.

[36] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.

[37] Alan Edelman,et al. Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[38] Woongki Baek,et al. Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[39] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[40] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[41] Zeyuan Allen Zhu,et al. Randomized accuracy-aware program transformations for efficient approximate computations , 2012, POPL '12.

[42] Bertrand A. Maher,et al. Glow: Graph Lowering Compiler Techniques for Neural Networks , 2018, ArXiv.

[43] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[44] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[45] Henry Hoffmann,et al. Quality of service profiling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[46] Natalie D. Enright Jerger,et al. Exploiting Errors for Efficiency: A Survey from Circuits to Algorithms , 2018, ArXiv.

[47] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[48] Martin C. Rinard,et al. Parallelizing Sequential Programs with Statistical Accuracy Tests , 2013, TECS.

[49] Henry Hoffmann,et al. Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[50] James Demmel,et al. Precimonious: Tuning assistant for floating-point precision , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[51] Dan Grossman,et al. EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[52] Scott A. Mahlke,et al. Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.

[53] Stelios Sidiroglou,et al. Dancing with uncertainty , 2012, RACES '12.

[54] Henry Hoffmann,et al. Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.

[55] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[56] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.