Enabling Scientific Computing on Memristive Accelerators
暂无分享,去创建一个
Engin Ipek | Shibo Wang | Ben Feinberg | Uday Kumar Reddy Vengalam | Nathan Whitehair | Engin Ipek | B. Feinberg | Shibo Wang | Nathan Whitehair | Ben Feinberg
[1] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[2] Z. Wei,et al. Highly reliable TaOx ReRAM and direct evidence of redox reaction mechanism , 2008, 2008 IEEE International Electron Devices Meeting.
[3] Engin Ipek,et al. Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning , 2017 .
[4] Yusuf Leblebici,et al. A 3.1 mW 8b 1.2 GS/s Single-Channel Asynchronous SAR ADC With Alternate Comparators for Enhanced Speed in 32 nm Digital SOI CMOS , 2013, IEEE Journal of Solid-State Circuits.
[5] Eric S. Chung,et al. Towards a Universal FPGA Matrix-Vector Multiplication Architecture , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.
[6] Florin Pop. High Performance Numerical Computing for High Energy Physics: A New Challenge for Big Data Science , 2014 .
[7] V. Springel,et al. Properties of galaxies reproduced by a hydrodynamic simulation , 2014, Nature.
[8] M. Hestenes,et al. Methods of conjugate gradients for solving linear systems , 1952 .
[9] Cong Xu,et al. Design of cross-point metal-oxide ReRAM emphasizing reliability and cost , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[10] William Rhett Davis,et al. FreePDK15: An Open-Source Predictive Process Design Kit for 15nm FinFET Technology , 2015, ISPD.
[11] Dejan Markovic,et al. A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs , 2014, FPGA.
[12] Yiran Chen,et al. GraphR: Accelerating Graph Processing Using ReRAM , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[13] R. Pielke. Mesoscale Meteorological Modeling , 1984 .
[14] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[15] Bernhard Schölkopf,et al. Kernel Methods in Computational Biology , 2005 .
[16] Paul Messina,et al. The Exascale Computing Project , 2017, Comput. Sci. Eng..
[17] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[18] Tao Zhang,et al. Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[19] J. Yang,et al. High switching endurance in TaOx memristive devices , 2010 .
[20] Chung-Wei Hsu,et al. Self-rectifying bipolar TaOx/TiO2 RRAM with superior endurance over 1012 cycles for 3D high-density storage-class memory , 2013, 2013 Symposium on VLSI Technology.
[21] Lilia Maliar,et al. Numerical Methods for Large-Scale Dynamic Economic Models , 2014 .
[22] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..
[23] Chris Yakopcic,et al. Model for maximum crossbar size based on input driver impedance , 2016 .
[24] Henk A. van der Vorst,et al. Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems , 1992, SIAM J. Sci. Comput..
[25] R. Sarpeshkar,et al. A 10-nW 12-bit accurate analog storage cell with 10-aA leakage , 2004, IEEE Journal of Solid-State Circuits.
[26] James Demmel,et al. IEEE Standard for Floating-Point Arithmetic , 2008 .
[27] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[28] Y. Saad,et al. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .
[29] Franz Franchetti,et al. Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).
[30] Gokcen Kestor,et al. Quantifying the energy cost of data movement in scientific applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).
[31] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[32] David H. Bailey,et al. High-precision floating-point arithmetic in scientific computation , 2004, Computing in Science & Engineering.
[33] Wouter A. Serdijn,et al. Analysis of Power Consumption and Linearity in Capacitive Digital-to-Analog Converters Used in Successive Approximation ADCs , 2011, IEEE Transactions on Circuits and Systems I: Regular Papers.
[34] Catherine Graves,et al. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[35] Jack J. Dongarra,et al. Efficiency of General Krylov Methods on GPUs -- An Experimental Study , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[36] Engin Ipek,et al. Making Memristive Neural Network Accelerators Reliable , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[37] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[38] Mayler G. A. Martins,et al. Open Cell Library in 15nm FreePDK Technology , 2015, ISPD.
[39] Ligang Gao,et al. High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm , 2011, Nanotechnology.
[40] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[41] Mark Horowitz,et al. FPU Generator for Design Space Exploration , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.
[42] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[43] Shyh-Chyi Wong,et al. Modeling of interconnect capacitance, delay, and crosstalk in VLSI , 2000 .
[44] Andrew B. Kahng,et al. CACTI 7 , 2017, ACM Trans. Archit. Code Optim..
[45] L. V. Allis,et al. Searching for solutions in games and artificial intelligence , 1994 .
[46] Mircea R. Stan,et al. Bus-invert coding for low-power I/O , 1995, IEEE Trans. Very Large Scale Integr. Syst..
[47] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[48] Alex Fit-Florea,et al. Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs , 2011 .
[49] Saibal Mukhopadhyay,et al. A programmable hardware accelerator for simulating dynamical systems , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[50] Thomas Toifl,et al. 28.5 A 10b 1.5GS/s pipelined-SAR ADC with background second-stage common-mode regulation and offset calibration in 14nm CMOS FinFET , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).
[51] Karin Strauss,et al. A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.
[52] Subramanian S. Iyer,et al. A 14 nm 1.1 Mb Embedded DRAM Macro With 1 ns Access , 2016, IEEE Journal of Solid-State Circuits.
[53] Yiran Chen,et al. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).