Accelerating divergent applications on SIMD architectures using neural networks
暂无分享,去创建一个
[1] Tor M. Aamodt,et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[2] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[3] James E. Smith,et al. Vector instruction set support for conditional operations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[4] Onur Mutlu,et al. Improving GPU performance via large warps and two-level warp scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[5] D. Quinlan,et al. ROSE: Compiler Support for Object-Oriented Frameworks , 1999, Parallel Process. Lett..
[6] Guoqiang Peter Zhang,et al. Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.
[7] Scott A. Mahlke,et al. Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.
[8] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.
[9] Tor M. Aamodt,et al. Thread block compaction for efficient SIMT control flow , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[10] Jason Cong,et al. CHARM: a composable heterogeneous accelerator-rich microprocessor , 2012, ISLPED '12.
[11] Glenn Reinman,et al. Improving Coverage and Reliability in Approximate Computing Using Application-Specific , Light-Weight Checks , 2014 .
[12] Mikko H. Lipasti,et al. BenchNN: On the broad potential application scope of hardware neural network accelerators , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[13] Henry Hoffmann,et al. Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.
[14] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.
[15] Pramod Kumar Meher. An optimized lookup-table for the evaluation of sigmoid function for artificial neural networks , 2010, 2010 18th IEEE/IFIP International Conference on VLSI and System-on-Chip.
[16] Sumit Gulwani,et al. Proving programs robust , 2011, ESEC/FSE '11.
[17] Luis Ceze,et al. Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.
[18] William J. Dally,et al. Efficient conditional operations for data-parallel architectures , 2000, MICRO 33.
[19] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .
[20] Woongki Baek,et al. Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.
[21] James M. Ortega,et al. Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.
[22] Steven Swanson,et al. QSCORES: Trading dark silicon for scalable energy efficiency with quasi-specific cores , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[23] Glenn Reinman,et al. Dynamically adaptive and reliable approximate computing using light-weight error analysis , 2014, 2014 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).
[24] Ingo Wald. Active thread compaction for GPU path tracing , 2011, HPG '11.
[25] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[26] Michele Banko,et al. Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.
[27] Martin C. Rinard,et al. Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.
[28] Dong Hyuk Woo,et al. SIMD divergence optimization through intra-warp compaction , 2013, ISCA.
[29] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[30] Michael Gschwind. Chip multiprocessing and the cell broadband engine , 2006, CF '06.
[31] Dan Grossman,et al. EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.
[32] Kevin Skadron,et al. A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..
[33] William J. Dally,et al. Conditional techniques for stream processing kernels , 2004 .
[34] Jie Cheng,et al. CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..
[35] Jason Cong,et al. Architecture support for accelerator-rich CMPs , 2012, DAC Design Automation Conference 2012.
[36] Subhasish Mitra,et al. ERSA: Error Resilient System Architecture for probabilistic applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).
[37] Kevin Skadron,et al. Dynamic warp subdivision for integrated branch and memory divergence tolerance , 2010, ISCA.
[38] Scott A. Mahlke,et al. SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[39] J. Nazuno. Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .
[40] John Sartori,et al. Branch and Data Herding: Reducing Control and Memory Divergence for Error-Tolerant GPU Applications , 2012, IEEE Transactions on Multimedia.
[41] Sudhakar Yalamanchili,et al. SIMD re-convergence at thread frontiers , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[42] Sudhakar Yalamanchili,et al. Characterization and transformation of unstructured control flow in bulk synchronous GPU applications , 2012, Int. J. High Perform. Comput. Appl..