Neural acceleration for GPU throughput processors
暂无分享,去创建一个
Hadi Esmaeilzadeh | Jongse Park | Amir Yazdanbakhsh | Pejman Lotfi-Kamran | Hardik Sharma | H. Esmaeilzadeh | Jongse Park | A. Yazdanbakhsh | Hardik Sharma | P. Lotfi-Kamran | Pejman Lotfi-Kamran
[1] Henry Hoffmann,et al. Patterns and statistical analysis for understanding reduced resource computing , 2010, OOPSLA.
[2] Quinn Jacobson,et al. ERSA: error resilient system architecture for probabilistic applications , 2010, DATE 2010.
[3] Donald Yeung,et al. Application-Level Correctness and its Impact on Fault Tolerance , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[4] Karthik Pattabiraman,et al. Flicker: Saving Refresh-Power in Mobile Devices through Critical Data Partitioning , 2009 .
[5] Krishna V. Palem,et al. Ultra-Efficient (Embedded) SOC Architectures based on Probabilistic CMOS (PCMOS) Technology , 2006, Proceedings of the Design Automation & Test in Europe Conference.
[6] Mahmut T. Kandemir,et al. A case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling flexible data compression with assist warps , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[7] Andreas Gerstlauer,et al. Approximate logic synthesis under general error magnitude and frequency constraints , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[8] Scott A. Mahlke,et al. Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.
[9] Glenn Reinman,et al. BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[10] John Sartori,et al. Branch and Data Herding: Reducing Control and Memory Divergence for Error-Tolerant GPU Applications , 2012, IEEE Transactions on Multimedia.
[11] Jacob Nelson,et al. SNNAP: Approximate computing on programmable SoCs via neural acceleration , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[12] Zheng Li,et al. Continuous real-world inputs can open up alternative accelerator designs , 2013, ISCA.
[13] Song Liu,et al. Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.
[14] M. Valero,et al. Fuzzy memoization for floating-point multimedia applications , 2005, IEEE Transactions on Computers.
[15] Luis Ceze,et al. General-purpose code acceleration with limited-precision analog computation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[16] Douglas L. Jones,et al. Scalable stochastic processors , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).
[17] Mohammad Zubair,et al. High Performance Implementation of an Econometrics and Financial Application on GPUs , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[18] Olivier Temam,et al. Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).
[19] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[20] Sherief Reda,et al. ABACUS: A technique for automated behavioral synthesis of approximate computing circuits , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[21] Glenn Reinman,et al. Accelerating divergent applications on SIMD architectures using neural networks , 2014, ICCD.
[22] Donald Yeung,et al. Exploiting Application-Level Correctness for Low-Cost Fault Tolerance , 2008, J. Instr. Level Parallelism.
[23] Martin C. Rinard,et al. Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.
[24] Steven Swanson,et al. Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.
[25] Luis Ceze,et al. Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.
[26] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[27] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[28] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[29] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[30] Joel C. Huegel,et al. Inverse Kinematics Solution for Robotic Manipulators Using a CUDA-Based Parallel Genetic Algorithm , 2011, MICAI.
[31] Mohammad Zubair,et al. A High Performance Implementation of Likelihood Estimators on GPUs , 2013 .
[32] Babak Falsafi,et al. Toward Dark Silicon in Servers , 2011, IEEE Micro.
[33] Scott A. Mahlke,et al. SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[34] Huawei Li,et al. A Fault Criticality Evaluation Framework of Digital Systems for Error Tolerant Video Applications , 2011, 2011 Asian Test Symposium.
[35] John Banning,et al. : An Efficient , 2022 .
[36] Vicky Wong,et al. Soft Error Resilience of Probabilistic Inference Applications , 2006 .
[37] Kaushik Roy,et al. ASLAN: Synthesis of approximate sequential circuits , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[38] Lingamneni Avinash,et al. Synthesizing Parsimonious Inexact Circuits through Probabilistic Design Techniques , 2013, TECS.
[39] Kia Bazargan,et al. Axilog: Language support for approximate hardware design , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[40] Richard M. Karp,et al. Algorithmic methodologies for ultra-efficient inexact architectures for sustaining technology scaling , 2012, CF '12.
[41] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[42] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.
[43] Jose-Maria Arnau,et al. Eliminating redundant fragment shader executions on a mobile GPU via hardware memoization , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[44] Dan Grossman,et al. EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.
[45] K. Sankaralingam,et al. Exploring the Synergy of Emerging Workloads and Silicon Reliability Trends , 2009 .
[46] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[47] Jacob Nelson,et al. Approximate storage in solid-state memories , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[48] Kunle Olukotun,et al. EMEURO: A framework for generating multi-purpose accelerators via deep learning , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[49] Woongki Baek,et al. Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.
[50] Naresh R. Shanbhag,et al. Energy-efficient signal processing via algorithmic noise-tolerance , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).
[51] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.
[52] Gu-Yeon Wei,et al. Process Variation Tolerant 3T1D-Based Cache Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[53] Karthikeyan Sankaralingam,et al. Relax: an architectural framework for software recovery of hardware faults , 2010, ISCA.
[54] Kaushik Roy,et al. SALSA: Systematic logic synthesis of approximate circuits , 2012, DAC Design Automation Conference 2012.
[55] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[56] Henry Hoffmann,et al. Quality of service profiling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.
[57] Henry Hoffmann,et al. Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.
[58] Kaushik Roy,et al. Quality programmable vector processors for approximate computing , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[59] Xin Zhang,et al. FlexJava: language support for safe and modular approximate programming , 2015, ESEC/SIGSOFT FSE.
[60] Mike O'Connor,et al. Cache-Conscious Wavefront Scheduling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.