ApproxHPVM: a portable compiler IR for accuracy-aware optimizations
暂无分享,去创建一个
Vikram S. Adve | Hashim Sharif | Prakalp Srivastava | Maria Kotsifakou | Sarita Adve | Sasa Misailovic | Muhammad Huzaifa | Nathan Zhao | Keyur Joshi | Yasmin Sarita | Vikram S. Adve | S. Adve | Sasa Misailovic | Keyur Joshi | Muhammad Huzaifa | Hashim Sharif | Prakalp Srivastava | Maria Kotsifakou | Yasmin Sarita | Nathan Zhao
[1] Anand Raghunathan,et al. Best-effort parallel execution framework for Recognition and mining applications , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[2] Scott A. Mahlke,et al. D2MA: Accelerating coarse-grained data transfer for GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[3] Dan Grossman,et al. Probability type inference for flexible approximate programming , 2015, OOPSLA.
[4] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[5] Ben Sander. HSAIL: Portable compiler IR for HSA , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).
[6] Sujan Kumar Gonugondla,et al. PROMISE: An End-to-End Design of a Programmable Mixed-Signal Accelerator for Machine-Learning Algorithms , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[7] Charbel Sakr,et al. Analytical Guarantees on Numerical Precision of Deep Neural Networks , 2017, ICML.
[8] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[9] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[10] Martin C. Rinard. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks , 2006, ICS '06.
[11] Sarita V. Adve,et al. Stash: Have your scratchpad and cache it too , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[12] Rakesh Kumar,et al. VideoChef: Efficient Approximation for Streaming Video Processing Pipelines , 2018, USENIX Annual Technical Conference.
[13] Luis Ceze,et al. General-purpose code acceleration with limited-precision analog computation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[14] Jianfei Cai,et al. Robust Transmission of JPEG2000 Encoded Images Over Packet Loss Channels , 2007, 2007 IEEE International Conference on Multimedia and Expo.
[15] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[16] Kalyan Veeramachaneni,et al. Autotuning algorithmic choice for input sensitivity , 2015, PLDI.
[17] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[18] Thu D. Nguyen,et al. ApproxHadoop: Bringing Approximations to MapReduce Frameworks , 2015, ASPLOS.
[19] Sarita V. Adve,et al. HPVM: heterogeneous parallel virtual machine , 2018, PPoPP.
[20] Weng-Fai Wong,et al. Exploiting half precision arithmetic in Nvidia GPUs , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).
[21] Gu-Yeon Wei,et al. HELIX-UP: Relaxing program semantics to unleash parallelization , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[22] Sachin S. Talathi,et al. Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.
[23] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.
[25] Martin C. Rinard,et al. Chisel: reliability- and accuracy-aware optimization of approximate computational kernels , 2014, OOPSLA.
[26] Dong Han,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[27] Michael G. Strintzis,et al. Optimized transmission of JPEG2000 streams over wireless channels , 2006, IEEE Transactions on Image Processing.
[28] Surendra Byna,et al. Exploiting the forgiving nature of applications for scalable parallel execution , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[29] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.
[30] Martin C. Rinard,et al. Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.
[31] Daniel M. Roy,et al. Probabilistically Accurate Program Transformations , 2011, SAS.
[32] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.
[33] Alexander Aiken,et al. Stochastic optimization of floating-point programs with tunable precision , 2014, PLDI.
[34] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[35] Sujan Kumar Gonugondla,et al. A Variation-Tolerant In-Memory Machine Learning Classifier via On-Chip Training , 2018, IEEE Journal of Solid-State Circuits.
[36] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[37] Alan Edelman,et al. Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[38] Woongki Baek,et al. Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.
[39] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.
[40] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[41] Zeyuan Allen Zhu,et al. Randomized accuracy-aware program transformations for efficient approximate computations , 2012, POPL '12.
[42] Bertrand A. Maher,et al. Glow: Graph Lowering Compiler Techniques for Neural Networks , 2018, ArXiv.
[43] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[44] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[45] Henry Hoffmann,et al. Quality of service profiling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.
[46] Natalie D. Enright Jerger,et al. Exploiting Errors for Efficiency: A Survey from Circuits to Algorithms , 2018, ArXiv.
[47] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[48] Martin C. Rinard,et al. Parallelizing Sequential Programs with Statistical Accuracy Tests , 2013, TECS.
[49] Henry Hoffmann,et al. Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.
[50] James Demmel,et al. Precimonious: Tuning assistant for floating-point precision , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[51] Dan Grossman,et al. EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.
[52] Scott A. Mahlke,et al. Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.
[53] Stelios Sidiroglou,et al. Dancing with uncertainty , 2012, RACES '12.
[54] Henry Hoffmann,et al. Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.
[55] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[56] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.