Efficient AI System Design With Cross-Layer Approximate Computing
暂无分享,去创建一个
Swagath Venkataramani | Xiao Sun | Mingu Kang | Moriyoshi Ohara | Vijayalakshmi Srinivasan | Wei Wang | George Gristede | Pong-Fei Lu | Sunil Shukla | Kailash Gopalakrishnan | Jinwook Oh | Chia-Yu Chen | Kazuaki Ishizaki | Christos Vezyrtzis | Leland Chang | Shih-Hsien Lo | Nianzheng Cao | Naigang Wang | Michael Guillorn | Marcel Schaal | Ching Zhou | Fanchieh Yee | Shubham Jain | Matthew Ziegler | Tina Babinsky | Howard Haynie | Jungwook Choi | Ankur Agarwal | Thomas Fox | Bruce Fleischer | Hiroshi Inoue | Michael Klaiber | Gary Maier | Silvia Mueller | Michael Scheuermann | Eri Ogawa | Mauricio Serrano | Joel Silberman | Jintao Zhang | Brian Curran | K. Gopalakrishnan | Chia-Yu Chen | Jungwook Choi | Jinwook Oh | J. Silberman | Shubham Jain | P. Lu | S. Mueller | Xiao Sun | Naigang Wang | Swagath Venkataramani | Leland Chang | V. Srinivasan | M. Scheuermann | S. Lo | G. Maier | B. Fleischer | H. Inoue | M. Serrano | Mingu Kang | G. Gristede | T. Fox | Christos Vezyrtzis | M. Klaiber | M. Ziegler | Kazuaki Ishizaki | Moriyoshi Ohara | Jintao Zhang | B. Curran | Eri Ogawa | M. Schaal | N. Cao | Michael Guillorn | Howard Haynie | Ching Zhou | Sunil Shukla | Wei Wang | Tina Babinsky | F. Yee | Ankur Agarwal
[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[2] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[3] Kyuyeon Hwang,et al. Fixed-point feedforward deep neural network design using weights +1, 0, and −1 , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).
[4] Melvin A. Breuer,et al. Multi-media applications and imprecise computation , 2005, 8th Euromicro Conference on Digital System Design (DSD'05).
[5] Paolo Napoletano,et al. Benchmark Analysis of Representative Deep Neural Network Architectures , 2018, IEEE Access.
[6] Kailash Gopalakrishnan,et al. DLFloat: A 16-b Floating Point Format Designed for Deep Learning Training and Inference , 2019, 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH).
[7] Anantha Chandrakasan,et al. Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).
[8] Charbel Sakr,et al. Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks , 2019, ICLR.
[9] Kaushik Roy,et al. Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency , 2010, Design Automation Conference.
[10] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[11] Kaushik Roy,et al. Energy-efficient recognition and mining processor using scalable effort design , 2013, Proceedings of the IEEE 2013 Custom Integrated Circuits Conference.
[12] Hiroshi Inoue,et al. DeepTools: Compiler and Execution Runtime Extensions for RaPiD AI Accelerator , 2019, IEEE Micro.
[13] Gu-Yeon Wei,et al. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[14] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[15] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[16] Ji Liu,et al. Staleness-Aware Async-SGD for Distributed Deep Learning , 2015, IJCAI.
[17] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[18] Swagath Venkataramani,et al. POSTER: Design Space Exploration for Performance Optimization of Deep Neural Networks on Shared Memory Accelerators , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[19] Swagath Venkataramani,et al. Compensated-DNN: Energy Efficient Low-Precision Deep Neural Networks by Compensating Quantization Errors , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[20] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[21] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[22] James T. Kwok,et al. Loss-aware Weight Quantization of Deep Networks , 2018, ICLR.
[23] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[24] Luis Ceze,et al. Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.
[25] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.
[26] D. Scott Cyphers,et al. Intel® nGraphTM , 2018 .
[27] Kaushik Roy,et al. Neural network accelerator design with resistive crossbars: Opportunities and challenges , 2019, IBM J. Res. Dev..
[28] Andrew Zisserman,et al. Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.
[29] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[30] Swagath Venkataramani,et al. Accurate and Efficient 2-bit Quantized Neural Networks , 2019, MLSys.
[31] Vijayalakshmi Srinivasan,et al. Approximate computing: Challenges and opportunities , 2016, 2016 IEEE International Conference on Rebooting Computing (ICRC).
[32] Wei Wang,et al. A Compiler for Deep Neural Network Accelerators to Generate Optimized Code for a Wide Range of Data Parameters from a Hand-crafted Computation Kernel , 2019, 2019 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS).
[33] Berin Martini,et al. NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.
[34] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[35] Srihari Cadambi,et al. A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification , 2012, TACO.
[36] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[37] Arnab Raha,et al. Towards full-system energy-accuracy tradeoffs: A case study of an approximate smart camera system? , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[38] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[39] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[40] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[41] Henry Hoffmann,et al. Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.
[42] Swagath Venkataramani,et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.
[43] Swagath Venkataramani,et al. Memory and Interconnect Optimizations for Peta-Scale Deep Learning Systems , 2019, 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC).
[44] Michael Behar,et al. Spring Hill (NNP-I 1000) Intel’s Data Center Inference Chip , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).
[45] Anand Raghunathan,et al. Best-effort parallel execution framework for Recognition and mining applications , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[46] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[47] Kaushik Roy,et al. AxNN: Energy-efficient neuromorphic systems using approximate computing , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).
[48] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Douglas L. Jones,et al. Scalable stochastic processors , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).
[50] Fabrizio Lombardi,et al. A Retrospective and Prospective View of Approximate Computing [Point of View} , 2020, Proc. IEEE.
[51] Song Liu,et al. Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.
[52] Srihari Cadambi,et al. A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.
[53] Anand Raghunathan,et al. Approximate memory compression for energy-efficiency , 2017, 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).
[54] Philip S. Yu,et al. A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[55] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[56] Zhiru Zhang,et al. Improving Neural Network Quantization without Retraining using Outlier Channel Splitting , 2019, ICML.
[57] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[58] Kaushik Roy,et al. Analysis and characterization of inherent application resilience for approximate computing , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[59] Shuchang Zhou,et al. Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks , 2017, Journal of Computer Science and Technology.
[60] Sujan Kumar Gonugondla,et al. A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array , 2018, IEEE Journal of Solid-State Circuits.
[61] Mario Badr,et al. Load Value Approximation , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[62] Jae-Joon Han,et al. Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Subhasish Mitra,et al. ERSA: Error Resilient System Architecture for probabilistic applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).
[64] Swagath Venkataramani,et al. BiScaled-DNN: Quantizing Long-tailed Datastructures with Two Scale Factors for Deep Neural Networks , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[65] Eunhyeok Park,et al. Weighted-Entropy-Based Quantization for Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[66] Shimeng Yu,et al. Neuro-Inspired Computing With Emerging Nonvolatile Memorys , 2018, Proceedings of the IEEE.
[67] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[69] Jungwon Lee,et al. Learning Sparse Low-Precision Neural Networks With Learnable Regularization , 2020, IEEE Access.
[70] Jian Sun,et al. Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.
[72] Charbel Sakr,et al. Analytical Guarantees on Numerical Precision of Deep Neural Networks , 2017, ICML.
[73] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[74] Sungroh Yoon,et al. Big/little deep neural network for ultra low power inference , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[75] Anand Raghunathan,et al. SparCE: Sparsity Aware General-Purpose Core Extensions to Accelerate Deep Neural Networks , 2017, IEEE Transactions on Computers.
[76] Song Han,et al. HAQ: Hardware-Aware Automated Quantization , 2018, ArXiv.
[77] Sherief Reda,et al. Runtime configurable deep neural networks for energy-accuracy trade-off , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[78] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[79] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.
[80] Anand Raghunathan,et al. TiM-DNN: Ternary In-Memory Accelerator for Deep Neural Networks , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[81] Anke Schmeink,et al. Variational Network Quantization , 2018, ICLR.
[82] Xiang Zhang,et al. Text Understanding from Scratch , 2015, ArXiv.
[83] Swagath Venkataramani,et al. Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks , 2019, NeurIPS.
[84] Bertrand A. Maher,et al. Glow: Graph Lowering Compiler Techniques for Neural Networks , 2018, ArXiv.
[85] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.
[86] Vivienne Sze,et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.
[87] Dan Alistarh,et al. Model compression via distillation and quantization , 2018, ICLR.
[88] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[89] Anand Raghunathan,et al. Data Subsetting: A Data-Centric Approach to Approximate Computing , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[90] Wei Zhang,et al. AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training , 2017, AAAI.
[91] David Gilmore,et al. Modeling Order in Neural Word Embeddings at Scale , 2015, ICML.
[92] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[93] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[94] Baochun Li,et al. Spotlight: Optimizing Device Placement for Training Deep Neural Networks , 2018, ICML.
[95] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[96] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[97] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[98] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[99] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[100] Vladimir Braverman,et al. Communication-efficient distributed SGD with Sketching , 2019, NeurIPS.
[101] Sam Ade Jacobs,et al. Communication Quantization for Data-Parallel Training of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).
[102] Kaushik Roy,et al. Conditional Deep Learning for energy-efficient and enhanced pattern recognition , 2015, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[103] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[104] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[105] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[106] Lingamneni Avinash,et al. Sustaining moore's law in embedded computing through probabilistic and approximate design: retrospects and prospects , 2009, CASES '09.
[107] Hossein Valavi,et al. A Mixed-Signal Binarized Convolutional-Neural-Network Accelerator Integrating Dense Weight Storage and Multiplication for Reduced Data Movement , 2018, 2018 IEEE Symposium on VLSI Circuits.
[108] Dan Grossman,et al. EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.
[109] Rajesh K. Gupta,et al. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[110] Song Han,et al. Trained Ternary Quantization , 2016, ICLR.
[111] Charbel Sakr,et al. Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm , 2018, ICLR.
[112] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.
[113] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[114] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[115] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.
[116] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[117] Scott A. Mahlke,et al. Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[118] Naresh R. Shanbhag,et al. Energy-efficient signal processing via algorithmic noise-tolerance , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).
[119] Kaushik Roy,et al. Significance driven computation: a voltage-scalable, variation-aware, quality-tuning motion estimator , 2009, ISLPED.
[120] Swagath Venkataramani,et al. Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators* , 2019, 2019 IEEE International Symposium on Workload Characterization (IISWC).
[121] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.
[122] H. T. Kung,et al. BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).
[123] Xin Wang,et al. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.
[124] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[125] Joel Silberman,et al. A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference , 2018, 2018 IEEE Symposium on VLSI Circuits.
[126] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.