PROMISE: An End-to-End Design of a Programmable Mixed-Signal Accelerator for Machine-Learning Algorithms
暂无分享,去创建一个
Sujan Kumar Gonugondla | Naresh R. Shanbhag | Nam Sung Kim | Mingu Kang | Vikram S. Adve | Prakalp Srivastava | Jungwook Choi | Sungmin Lim | Sujan K. Gonugondla | Naresh R Shanbhag | Jungwook Choi | N. Kim | Mingu Kang | Prakalp Srivastava | Sungmin Lim
[1] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[2] Naresh R. Shanbhag,et al. An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Simha Sethumadhavan,et al. Hybrid Analog-Digital Solution of Nonlinear Partial Differential Equations , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Charbel Sakr,et al. An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Sigurd Wagner,et al. 16.2 A large-area image sensing and detection system based on embedded thin-film classifiers , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.
[6] Daisuke Miyashita,et al. Mixed-signal circuits for embedded machine-learning applications , 2015, 2015 49th Asilomar Conference on Signals, Systems and Computers.
[7] Martin C. Rinard,et al. Chisel: reliability- and accuracy-aware optimization of approximate computational kernels , 2014, OOPSLA.
[8] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[9] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[10] Dong Han,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[11] Sujan Kumar Gonugondla,et al. A 19.4-nJ/Decision, 364-K Decisions/s, In-Memory Random Forest Multi-Class Inference Accelerator , 2018, IEEE Journal of Solid-State Circuits.
[12] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[13] Xuehai Zhou,et al. PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.
[14] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools (2nd Edition) , 2006 .
[15] Charbel Sakr,et al. Analytical Guarantees on Numerical Precision of Deep Neural Networks , 2017, ICML.
[16] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.
[17] Kathryn S. McKinley,et al. Uncertain: a first-order type for uncertain data , 2014, ASPLOS.
[18] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[19] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[20] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[21] Len Thomas,et al. A method for detecting whistles, moans, and other frequency contour sounds. , 2011, The Journal of the Acoustical Society of America.
[22] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[23] Sanu Mathew,et al. 14.4 A 21.5M-query-vectors/s 3.37nJ/vector reconfigurable k-nearest-neighbor accelerator with adaptive precision in 14nm tri-gate CMOS , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).
[24] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[25] David Blaauw,et al. Compute Caches , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[26] Sujan Kumar Gonugondla,et al. A 42pJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).
[27] Yuan Gao,et al. RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[28] Ninghui Sun,et al. DianNao family , 2016, Commun. ACM.
[29] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[30] Zhuo Wang,et al. In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.
[31] Gu-Yeon Wei,et al. 14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).
[32] Roberto Brunelli,et al. Face Recognition: Features Versus Templates , 1993, IEEE Trans. Pattern Anal. Mach. Intell..
[33] Naresh R. Shanbhag,et al. An energy-efficient memory-based high-throughput VLSI architecture for convolutional networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[35] K. Kim,et al. Face recognition using kernel principal component analysis , 2002, IEEE Signal Process. Lett..
[36] Sujan Kumar Gonugondla,et al. A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array , 2018, IEEE Journal of Solid-State Circuits.
[37] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[38] S. Simon Wong,et al. 24.2 A 2.5GHz 7.7TOPS/W switched-capacitor matrix multiplier with co-designed local memory in 40nm , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).
[39] Alan Edelman,et al. Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..
[40] Naresh R. Shanbhag,et al. A 19.4 nJ/decision 364K decisions/s in-memory random forest classifier in 6T SRAM array , 2017, ESSCIRC 2017 - 43rd IEEE European Solid State Circuits Conference.
[41] Siddharth Joshi,et al. 21.7 2pJ/MAC 14b 8×8 linear transform mixed-signal spatial filter in 65nm CMOS with 84dB interference suppression , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).
[42] Yann LeCun,et al. CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.
[43] Dan Grossman,et al. EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.
[44] Naresh R. Shanbhag,et al. In-Memory Computing Architectures for Sparse Distributed Memory , 2016, IEEE Transactions on Biomedical Circuits and Systems.
[45] Michael P. Flynn,et al. A 3.43TOPS/W 48.9pJ/pixel 50.1nJ/classification 512 analog neuron sparse coding neural network with on-chip learning and classification in 40nm CMOS , 2017, 2017 Symposium on VLSI Circuits.
[46] Yu Wang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[47] Trevor Mudge,et al. Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[48] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[49] Mark N. Wegman,et al. Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.
[50] Gert Cauwenberghs,et al. Kerneltron: Support Vector 'Machine' in Silicon , 2002, SVM.