Sculptor: Flexible Approximation with Selective Dynamic Loop Perforation

Loop perforation is one of the most well known software techniques in approximate computing. It transforms loops to periodically skip subsets of their iterations. It is general, simple, and effective. However, during analysis, it only considers the number of instructions to skip, but does not consider the differences between instructions and loop iterations. Based on our observation, these differences have considerable influence on performance and accuracy. To improve traditional perforation, we introduce selective dynamic loop perforation, a general approximation technique that automatically transforms loops to skip selected instructions in selected iterations. It provides the flexibility to craft approximation strategies at the dynamic instruction level. The main challenges in selective dynamic loop perforation are how to capture the characteristics of instructions, optimize perforation strategies based on these characteristics, and minimize additional runtime overhead. In this paper, we propose several compiler optimizations to resolve these challenges, including optimized instruction-level, load based and store based selective perforation, and self-directed dynamic perforation with a dynamic start and dynamic perforation rates. Across a range of 8 applications from various domains, selective dynamic loop perforation achieves average speedups of 2.89x and 4.07x with 5% and 10% error budgets, while traditional loop perforation achieves 1.47x and 1.93x, respectively, for the same error budgets.

[1]  Mario Badr,et al.  Load Value Approximation , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[2]  Daniel M. Roy,et al.  Probabilistically Accurate Program Transformations , 2011, SAS.

[3]  Natalie D. Enright Jerger,et al.  Doppelgänger: A cache for approximate computing , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Scott A. Mahlke,et al.  SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Martin Rinard,et al.  Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures , 2009 .

[6]  Luis Ceze,et al.  General-purpose code acceleration with limited-precision analog computation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[7]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[8]  Thu D. Nguyen,et al.  ApproxHadoop: Bringing Approximations to MapReduce Frameworks , 2015, ASPLOS.

[9]  Gu-Yeon Wei,et al.  HELIX-UP: Relaxing program semantics to unleash parallelization , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[10]  Henry Hoffmann,et al.  Application heartbeats for software performance and health , 2010, PPoPP '10.

[11]  Douglas L. Jones,et al.  Scalable stochastic processors , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[12]  Scott A. Mahlke,et al.  Input responsiveness: using canary inputs to dynamically steer approximation , 2016, PLDI.

[13]  Hadi Esmaeilzadeh,et al.  Neural acceleration for GPU throughput processors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[15]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[16]  Martin C. Rinard,et al.  Parallelizing Sequential Programs with Statistical Accuracy Tests , 2013, TECS.

[17]  Henry Hoffmann,et al.  Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[18]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Sarita V. Adve,et al.  Approxilyzer: Towards a systematic framework for instruction-level approximate computing and its application to hardware resiliency , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  David Atienza,et al.  Design of energy efficient and dependable health monitoring systems under unreliable nanometer technologies , 2012, BODYNETS.

[21]  Natalie D. Enright Jerger,et al.  The Bunker Cache for spatio-value approximation , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Henry Hoffmann,et al.  JouleGuard: energy guarantees for approximate applications , 2015, SOSP.

[23]  Scott A. Mahlke,et al.  Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.

[24]  Mark D. Corner,et al.  Eon: a language and runtime system for perpetual systems , 2007, SenSys '07.

[25]  Alan Edelman,et al.  Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[26]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[27]  Onur Mutlu,et al.  Mitigating the Memory Bottleneck With Approximate Load Value Prediction , 2016, IEEE Design & Test.

[28]  Jacob Nelson,et al.  SNNAP: Approximate computing on programmable SoCs via neural acceleration , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[29]  Scott A. Mahlke,et al.  Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[30]  Martin C. Rinard,et al.  Chisel: reliability- and accuracy-aware optimization of approximate computational kernels , 2014, OOPSLA.

[31]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[32]  Scott A. Mahlke,et al.  Harnessing Soft Computations for Low-Budget Fault Tolerance , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[33]  Martin C. Rinard Probabilistic accuracy bounds for fault-tolerant computations that discard tasks , 2006, ICS '06.

[34]  Onur Mutlu,et al.  RFVP: Rollback-Free Value Prediction with Safe-to-Approximate Loads , 2016, ACM Trans. Archit. Code Optim..

[35]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[36]  Dan Grossman,et al.  Probability type inference for flexible approximate programming , 2015, OOPSLA.

[37]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[38]  Saurabh Bagchi,et al.  Phase-aware optimization in approximate computing , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[39]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).