AxMemo: Hardware-Compiler Co-Design for Approximate Code Memoization
暂无分享,去创建一个
Nam Sung Kim | Hadi Esmaeilzadeh | Amir Yazdanbakhsh | Zhenhong Liu | Dong Kai Wang | N. Kim | H. Esmaeilzadeh | A. Yazdanbakhsh | Zhenhong Liu | Dong Kai Wang
[1] Hadi Esmaeilzadeh,et al. Neural acceleration for GPU throughput processors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[2] Wei Zhang,et al. Low-Power FPGA Design Using Memoization-Based Approximate Computing , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[3] Gu-Yeon Wei,et al. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[4] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[5] Guowei Zhang,et al. Leveraging Hardware Caches for Memoization , 2018, IEEE Computer Architecture Letters.
[6] David M. Brooks,et al. ISA-independent workload characterization and its implications for specialized architectures , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[7] Alexander Moreno,et al. Speeding up Large-Scale Financial Recomputation with Memoization , 2014, 2014 Seventh Workshop on High Performance Computational Finance.
[8] Tajana Rosing,et al. Nvalt: Nonvolatile Approximate Lookup Table for GPU Acceleration , 2018, IEEE Embedded Systems Letters.
[9] Natalie D. Enright Jerger,et al. The Bunker Cache for spatio-value approximation , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[10] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[11] Mateo Valero,et al. ATM: Approximate Task Memoization in the Runtime System , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[12] Josep Torrellas,et al. SoftSig: Software-Exposed Hardware Signatures for Code Analysis and Optimization , 2008, IEEE Micro.
[13] Hiroshi Nakashima,et al. Design and evaluation of an auto-memoization processor , 2007, Parallel and Distributed Computing and Networks.
[14] W. W. PETERSONt,et al. Cyclic Codes for Error Detection * , 2022 .
[15] Mario Badr,et al. Load Value Approximation , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[16] Hadi Esmaeilzadeh,et al. AxBench: A Multiplatform Benchmark Suite for Approximate Computing , 2017, IEEE Design & Test.
[17] Song Liu,et al. Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.
[18] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[19] Onur Mutlu,et al. RFVP: Rollback-Free Value Prediction with Safe-to-Approximate Loads , 2016, ACM Trans. Archit. Code Optim..
[20] Nam Sung Kim,et al. Load-Triggered Warp Approximation on GPU , 2018, ISLPED.
[21] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[22] S. Richardson. Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation , 1992 .
[23] Farinaz Koushanfar,et al. LookNN: Neural network with no multiplication , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.
[24] Wen-mei W. Hwu,et al. Compiler-directed dynamic computation reuse: rationale and initial results , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[25] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.
[26] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[27] Henry Hoffmann,et al. Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.
[28] Scott A. Mahlke,et al. SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[29] Jacob Nelson,et al. Approximate storage in solid-state memories , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[30] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[31] Mikko H. Lipasti,et al. Value locality and load value prediction , 1996, ASPLOS VII.
[32] Scott A. Mahlke,et al. Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.
[33] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .
[34] Nikolaos Hardavellas,et al. Temporal Approximate Function Memoization , 2018, IEEE Micro.