论文信息 - Efficient Processing of Deep Neural Networks

Efficient Processing of Deep Neural Networks

Abstract This book provides a structured treatment of the key principles and techniques for enabling efficient processing of deep neural networks (DNNs). DNNs are currently widely used for many art...

V. Sze | J. Emer | Yu-hsin Chen | Tien-Ju Yang

[1] S. Belloni,et al. DeepBench , 2022, Proceedings of the 2022 workshop on 9th International Workshop of Testing Database Systems.

[2] Candace Moore,et al. Deep learning frameworks , 2021, Radiopaedia.org.

[3] Vivienne Sze,et al. An Architecture-Level Energy and Area Estimator for Processing-In-Memory Accelerator Designs , 2020, 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[4] Fjodor van Veen,et al. The Neural Network Zoo , 2020, Proceedings.

[5] Dirk Englund,et al. Digital Optical Neural Networks for Large-Scale Machine Learning , 2020, 2020 Conference on Lasers and Electro-Optics (CLEO).

[6] Jose Javier Gonzalez Ortiz,et al. What is the State of Neural Network Pruning? , 2020, MLSys.

[7] Michael Carbin,et al. Comparing Rewinding and Fine-tuning in Neural Network Pruning , 2020, ICLR.

[8] Luca P. Carloni,et al. Silicon Photonics Codesign for Deep Learning , 2020, Proceedings of the IEEE.

[9] Neurosurgeon , 2020, Definitions.

[10] Jonathan Chang,et al. 15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[11] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[12] Vivienne Sze,et al. Design Considerations for Efficient Deep Neural Networks on Processing-in-Memory Accelerators , 2019, 2019 IEEE International Electron Devices Meeting (IEDM).

[13] Gu-Yeon Wei,et al. A binary-activation, multi-level weight RNN and training algorithm for processing-in-memory inference with eNVM , 2019, ArXiv.

[14] Vivienne Sze,et al. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[15] B. Murmann,et al. RRAM-Based In-Memory Computing for Embedded Deep Neural Networks , 2019, 2019 53rd Asilomar Conference on Signals, Systems, and Computers.

[16] Christian Enz,et al. Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[17] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[18] Aamer Jaleel,et al. ExTensor: An Accelerator for Sparse Tensor Algebra , 2019, MICRO.

[19] David Wentzlaff,et al. ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMs , 2019, MICRO.

[20] T. N. Vijaykumar,et al. SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks , 2019, MICRO.

[21] Wei Tang,et al. CASCADE: Connecting RRAMs to Extend Analog Dataflow In An End-To-End In-Memory Processing Paradigm , 2019, MICRO.

[22] William J. Dally,et al. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture , 2019, MICRO.

[23] Hongyang Jia,et al. In-Memory Computing: Advances and prospects , 2019, IEEE Solid-State Circuits Magazine.

[24] Wooseok Yi,et al. BitBlade: Area and Energy-Efficient Precision-Scalable Neural Network Accelerator with Bitwise Summation , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[25] P. Bai,et al. Non-Volatile RRAM Embedded into 22FFL FinFET Technology , 2019, 2019 Symposium on VLSI Technology.

[26] William J. Dally,et al. A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm , 2019, 2019 Symposium on VLSI Circuits.

[27] Pradeep Dubey,et al. A Study of BFLOAT16 for Deep Learning Training , 2019, ArXiv.

[28] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[29] Quoc V. Le,et al. Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30] Jason Clemons,et al. Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration , 2019, ASPLOS.

[31] Christoforos E. Kozyrakis,et al. TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators , 2019, ASPLOS.

[32] Patrick Judd,et al. Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks , 2019, ASPLOS.

[33] Sertac Karaman,et al. FastDepth: Fast Monocular Depth Estimation on Embedded Systems , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[34] Rudy Lauwereins,et al. Sub-Word Parallel Precision-Scalable MAC Engines for Efficient Embedded DNN Inference , 2019, 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS).

[35] Brucek Khailany,et al. Timeloop: A Systematic Approach to DNN Accelerator Evaluation , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[36] Erich Elsen,et al. The State of Sparsity in Deep Neural Networks , 2019, ArXiv.

[37] George Papandreou,et al. DeeperLab: Single-Shot Image Parser , 2019, ArXiv.

[38] Arash AziziMazreah,et al. Shortcut Mining: Exploiting Cross-Layer Shortcut Reuse in DCNN Accelerators , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[39] Reetuparna Das,et al. Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[40] Lee-Sup Kim,et al. NAND-Net: Minimizing Computational Complexity of In-Memory Processing for Binary Neural Networks , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[41] Li Fei-Fei,et al. Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Tayfun Gokmen,et al. The Next Generation of Deep Learning Hardware: Analog Computing , 2019, Proceedings of the IEEE.

[43] Tadahiro Kuroda,et al. QUEST: Multi-Purpose Log-Quantized DNN Inference Engine Stacked on 96-MB 3-D SRAM Using Inductive Coupling Technology in 40-nm CMOS , 2019, IEEE Journal of Solid-State Circuits.

[44] Marian Verhelst,et al. An Always-On 3.8 $\mu$ J/86% CIFAR-10 Mixed-Signal Binary CNN Processor With All Memory on Chip in 28-nm CMOS , 2019, IEEE Journal of Solid-State Circuits.

[45] Niraj K. Jha,et al. ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Yuandong Tian,et al. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47] T. Ghani,et al. MRAM as Embedded Non-Volatile Memory Solution for 22FFL FinFET Technology , 2018, 2018 IEEE International Electron Devices Meeting (IEDM).

[48] Ryan Hamerly,et al. Large-Scale Optical Neural Networks based on Photoelectric Multiplication , 2018, Physical Review X.

[49] N. Verma,et al. A Microprocessor implemented in 65nm CMOS with Configurable and Bit-scalable Accelerator for Programmable In-memory Computing , 2018, ArXiv.

[50] H. T. Kung,et al. Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization , 2018, ASPLOS.

[51] Mostafa Mahmoud,et al. Diffy: a Déjà vu-Free Differential Deep Neural Network Accelerator , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[52] Trevor Darrell,et al. Rethinking the Value of Network Pruning , 2018, ICLR.

[53] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[54] George Papandreou,et al. Searching for Efficient Multi-Scale Architectures for Dense Image Prediction , 2018, NeurIPS.

[55] Hoi-Jun Yoo,et al. DNPU: An Energy-Efficient Deep-Learning Processor with Heterogeneous Multi-Core Architecture , 2018, IEEE Micro.

[56] Marian Verhelst,et al. Laika: A 5uW Programmable LSTM Accelerator for Always-on Keyword Spotting in 65nm CMOS , 2018, ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC).

[57] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Aaron Klein,et al. Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search , 2018, ArXiv.

[59] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[60] Quoc V. Le,et al. Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[61] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[62] Hossein Valavi,et al. A Mixed-Signal Binarized Convolutional-Neural-Network Accelerator Integrating Dense Weight Storage and Multiplication for Reduced Data Movement , 2018, 2018 IEEE Symposium on VLSI Circuits.

[63] Eric S. Chung,et al. A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[64] Leibo Liu,et al. An Ultra-High Energy-Efficient Reconfigurable Processor for Deep Neural Networks with Binary/Ternary Weights in 28NM CMOS , 2018, 2018 IEEE Symposium on VLSI Circuits.

[65] Rajesh K. Gupta,et al. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[66] Tao Li,et al. Prediction Based Execution on Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[67] Jose-Maria Arnau,et al. Computation Reuse in DNNs by Exploiting Input Similarity , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[68] J. Hennessy. A new golden age for computer architecture: Domain-specific hardware/software co-design, enhanced security, open instruction sets, and agile chip development , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[69] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[70] Nam Sung Kim,et al. GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[71] David Blaauw,et al. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[72] Hyoukjun Kwon,et al. MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators , 2018, ArXiv.

[73] Shoaib Kamil,et al. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code , 2018, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[74] Saman P. Amarasinghe,et al. Format abstraction for sparse tensor algebra compilers , 2018, Proc. ACM Program. Lang..

[75] Sujan Kumar Gonugondla,et al. An In-Memory VLSI Architecture for Convolutional Neural Networks , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[76] Hari Angepat,et al. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , 2018, IEEE Micro.

[77] Mengjia Yan,et al. UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[78] Yi Luo,et al. All-optical machine learning using diffractive deep neural networks , 2018, Science.

[79] Bo Chen,et al. NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[80] Matthew Mattina,et al. Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[81] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.

[82] Suren Jayasuriya,et al. EVA²: Exploiting Temporal Redundancy in Live Computer Vision , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[83] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[84] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[85] Anantha Chandrakasan,et al. Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[86] Meng-Fan Chang,et al. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[87] Hoi-Jun Yoo,et al. UNPU: A 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[88] Marian Verhelst,et al. An always-on 3.8μJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[89] Shimeng Yu,et al. Neuro-Inspired Computing With Emerging Nonvolatile Memorys , 2018, Proceedings of the IEEE.

[90] Sujan Kumar Gonugondla,et al. A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array , 2018, IEEE Journal of Solid-State Circuits.

[91] Jonathan Ragan-Kelley,et al. Halide , 2017 .

[92] Hadi Esmaeilzadeh,et al. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[93] F. Merrikh Bayat,et al. Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[94] Elad Eban,et al. MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[95] Asit K. Mishra,et al. Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.

[96] Benoît Meister,et al. Polyhedral Optimization of TensorFlow Computation Graphs , 2017, ESPT/VPA@SC.

[97] Xin Wang,et al. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.

[98] Oriol Vinyals,et al. Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[99] Diana Marculescu,et al. NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks , 2017, ArXiv.

[100] Yuan Xie,et al. DRISA: A DRAM-based Reconfigurable In-Situ Accelerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[101] Onur Mutlu,et al. Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[102] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..

[103] Joel Emer,et al. A method to estimate the energy consumption of deep neural networks , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[104] Gang Sun,et al. Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[105] Eriko Nurvitadhi,et al. WRPN: Wide Reduced-Precision Networks , 2017, ICLR.

[106] Li Shen,et al. Deep Learning to Improve Breast Cancer Detection on Screening Mammography , 2017, Scientific Reports.

[107] Wei Wu,et al. Practical Block-Wise Neural Network Architecture Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[108] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[109] Trevor Darrell,et al. Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[110] Jianxin Wu,et al. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[111] Xiangyu Zhang,et al. Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[112] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[113] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[114] Yong Yu,et al. Efficient Architecture Search by Network Transformation , 2017, AAAI.

[115] Scott A. Mahlke,et al. Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[116] W. Dally,et al. SCNN , 2017 .

[117] David J. Palframan,et al. Scalpel , 2017 .

[118] Patrick Judd,et al. Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks , 2017, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[119] Vivienne Sze,et al. Using Dataflow to Optimize Energy Efficiency of Deep Neural Network Accelerators , 2017, IEEE Micro.

[120] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[121] Tadahiro Kuroda,et al. BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS , 2017, 2017 Symposium on VLSI Circuits.

[122] Leibo Liu,et al. A 1.06-to-5.09 TOPS/W reconfigurable hybrid-neural-network processor for deep learning applications , 2017, 2017 Symposium on VLSI Circuits.

[123] Meng-Fan Chang,et al. A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory , 2017, 2017 Symposium on VLSI Technology.

[124] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[125] Charbel Sakr,et al. PredictiveNet: An energy-efficient convolutional neural network via zero prediction , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[126] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[127] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[128] Leibo Liu,et al. Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[129] Trevor N. Mudge,et al. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[130] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[131] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[132] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[133] Vivienne Sze,et al. Towards closing the energy gap between HOG and CNN features for embedded vision (Invited paper) , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[134] Daisuke Miyashita,et al. LogNet: Energy-efficient neural networks using logarithmic computation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[135] Quoc V. Le,et al. Large-Scale Evolution of Image Classifiers , 2017, ICML.

[136] Rahul Sukthankar,et al. Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[137] Jian Sun,et al. Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[138] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[139] Yiran Chen,et al. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[140] Marian Verhelst,et al. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[141] Sebastian Thrun,et al. Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[142] Tomas Pfister,et al. Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[143] Aaron Klein,et al. Towards Automatically-Tuned Neural Networks , 2016, AutoML@ICML.

[144] Philip Heng Wai Leong,et al. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[145] Vivienne Sze,et al. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[146] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[147] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[148] An Chen,et al. A review of emerging non-volatile memory (NVM) technologies and applications , 2016 .

[149] Andreas Moshovos,et al. Bit-Pragmatic Deep Neural Network Computing , 2016, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[150] Vincent Dumoulin,et al. Deconvolution and Checkerboard Artifacts , 2016 .

[151] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[152] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[153] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[154] Roland Siegwart,et al. From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[155] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[156] Christian Ledig,et al. Is the deconvolution layer the same as a convolutional layer? , 2016, ArXiv.

[157] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[158] Ying Zhang,et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks , 2016, INTERSPEECH.

[159] Kevin Petrecca,et al. Neural networks improve brain cancer detection with Raman spectroscopy in the presence of operating room light artifacts , 2016, Journal of biomedical optics.

[160] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.

[161] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[162] Gu-Yeon Wei,et al. Fathom: reference workloads for modern deep learning methods , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[163] Yurong Chen,et al. Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[164] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[165] Xiaoou Tang,et al. Accelerating the Super-Resolution Convolutional Neural Network , 2016, ECCV.

[166] Shuicheng Yan,et al. Training Skinny Deep Neural Networks with Iterative Hard Thresholding Methods , 2016, ArXiv.

[167] Daniel Rueckert,et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[168] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[169] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[170] Dayong Wang,et al. Deep Learning for Identifying Metastatic Breast Cancer , 2016, ArXiv.

[171] Yu Wang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[172] Lin Zhong,et al. RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[173] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[174] Luca Benini,et al. YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[175] Naveen Verma,et al. A machine-learning classifier implemented in a standard 6T SRAM array , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[176] Marian Verhelst,et al. A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[177] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[178] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[179] Nassir Navab,et al. Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[180] David K. Gifford,et al. Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[181] Vivienne Sze,et al. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[182] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[183] Ashok Veeraraghavan,et al. ASP Vision: Optically Computing the First Layer of Convolutional Neural Networks Using Angle Sensitive Pixels , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[184] Vivienne Sze,et al. FAST: A Framework to Accelerate Super-Resolution Processing on Compressed Videos , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[185] Andrew S. Cassidy,et al. Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[186] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[187] Gökmen Tayfun,et al. Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations , 2016, Front. Neurosci..

[188] Matthew Richardson,et al. Do Deep Convolutional Nets Really Need to be Deep and Convolutional? , 2016, ICLR.

[189] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[190] Gert Cauwenberghs,et al. Neuromorphic architectures with electronic synapses , 2016, 2016 17th International Symposium on Quality Electronic Design (ISQED).

[191] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[192] Daisuke Miyashita,et al. Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[193] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[194] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[195] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[196] Soheil Ghiasi,et al. Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[197] Yoshua Bengio,et al. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[198] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[199] V. Sze,et al. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2016, IEEE Journal of Solid-State Circuits.

[200] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[201] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[202] Margaret Martonosi,et al. DeSC: Decoupled supply-compute communication management for heterogeneous architectures , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[203] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[204] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[205] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[206] Eunhyeok Park,et al. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[207] George Karypis,et al. Tensor-matrix products with a compressed sparse tensor , 2015, IA3@SC.

[208] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[209] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[210] Andrew Lavin,et al. Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[211] Sergey Levine,et al. Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[212] O. Troyanskaya,et al. Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[213] B. Frey,et al. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[214] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[215] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[216] Luca P. Carloni,et al. An analysis of accelerator coupling in heterogeneous architectures , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[217] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[218] Luca Benini,et al. Origami: A Convolutional Network Accelerator , 2015, ACM Great Lakes Symposium on VLSI.

[219] Seunghoon Hong,et al. Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[220] Jianxiong Xiao,et al. DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[221] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[222] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[223] Hoi-Jun Yoo,et al. 4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[224] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[225] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[226] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[227] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[228] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[229] Naveen Verma,et al. 18.4 A matrix-multiplying ADC implementing a machine-learning classifier directly with data conversion , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[230] B. Frey,et al. The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[231] Xiaoou Tang,et al. Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[232] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[233] Ivan V. Oseledets,et al. Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[234] Samira Ebrahimi Kahou,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[235] Benjamin Graham,et al. Fractional Max-Pooling , 2014, ArXiv.

[236] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[237] Farnood Merrikh-Bayat,et al. Training and operation of an integrated neuromorphic network based on metal-oxide memristors , 2014, Nature.

[238] Thomas Brox,et al. Learning to Generate Chairs, Tables and Cars with Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[239] Trevor Darrell,et al. Fully convolutional networks for semantic segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[240] Paul R. Prucnal,et al. Broadcast and Weight: An Integrated Network For Scalable Photonic Spike Processing , 2014, Journal of Lightwave Technology.

[241] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[242] Vivienne Sze,et al. Energy-efficient HOG-based object detection at 1080HD 60 fps with multi-scale support , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[243] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[244] Jason Cong,et al. Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.

[245] Xiaoou Tang,et al. Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[246] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[247] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[248] Andrew S. Cassidy,et al. A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[249] Berin Martini,et al. A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[250] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[251] Aaron C. Courville,et al. Generative Adversarial Networks , 2014, 1406.2661.

[252] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[253] Jason Cong,et al. Accelerator-rich architectures: Opportunities and progresses , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[254] Xiaohui Zhang,et al. Improving deep neural network acoustic models using generalized maxout networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[255] Joan Bruna,et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[256] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[257] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[258] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[259] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[260] Yann LeCun,et al. Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[261] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[262] Qiang Chen,et al. Network In Network , 2013, ICLR.

[263] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[264] Henk Corporaal,et al. Memory-centric accelerator design for Convolutional Neural Networks , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[265] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.

[266] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI.

[267] Geoffrey Zweig,et al. Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[268] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[269] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[270] Albert Wang,et al. A 180nm CMOS image sensor with on-chip optoelectronic image compression , 2012, Proceedings of the IEEE 2012 Custom Integrated Circuits Conference.

[271] François Fleuret,et al. Exact Acceleration of Linear Object Detectors , 2012, ECCV.

[272] J. Jeddeloh,et al. Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[273] Christoforos E. Kozyrakis,et al. Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[274] B. Ramkumar,et al. Low-Power and Area-Efficient Carry Select Adder , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[275] Graham W. Taylor,et al. Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[276] Honglak Lee,et al. Unsupervised learning of hierarchical representations with convolutional deep belief networks , 2011, Commun. ACM.

[277] Bill Dally,et al. Power, Programmability, and Granularity: The Challenges of ExaScale Computing , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[278] Heng-Yuan Lee,et al. A 4Mb embedded SLC resistive-RAM macro with 7.2ns read-write random-access time and 160ns MLC-access capability , 2011, 2011 IEEE International Solid-State Circuits Conference.

[279] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[280] Wayne Luk,et al. Towards an embedded biologically-inspired machine vision processor , 2010, 2010 International Conference on Field-Programmable Technology.

[281] Jiale Liang,et al. Cross-Point Memory Array Without Cell Selectors—Device Characteristics and Data Storage Pattern Dependencies , 2010, IEEE Transactions on Electron Devices.

[282] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[283] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[284] Srihari Cadambi,et al. A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.

[285] Roberto Bez,et al. A 90nm 4Mb embedded phase-change memory with 1.2V 12ns read access time and 1MB/s write throughput , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[286] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.

[287] Srihari Cadambi,et al. A Massively Parallel Coprocessor for Convolutional Neural Networks , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[288] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[289] Antonio Torralba,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[290] John R. Gilbert,et al. On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[291] Rich Caruana,et al. Model compression , 2006, KDD '06.

[292] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[293] Bernard Carlos Widrow,et al. Thinking about thinking: the discovery of the LMS algorithm , 2005, IEEE Signal Process. Mag..

[294] 裕幸飯田,et al. International Technology Roadmap for Semiconductors 2003の要求清浄度について－シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について－ , 2004 .

[295] Norbert Wehn,et al. Embedded DRAM Development: Technology, Physical Design, and Application Issues , 2001, IEEE Des. Test Comput..

[296] Amy Hsiu-Fen Chou,et al. Flash Memories , 2000, The VLSI Handbook.

[297] S. Hochreiter,et al. Long Short-Term Memory , 1997, Neural Computation.

[298] Volker Tresp,et al. Early Brain Damage , 1996, NIPS.

[299] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[300] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[301] Russell Reed,et al. Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[302] Joan L. Mitchell,et al. JPEG: Still Image Data Compression Standard , 1992 .

[303] D. Williamson. Dynamically scaled fixed point arithmetic , 1991, [1991] IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Conference Proceedings.

[304] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[305] David H. Bailey,et al. Using Strassen's algorithm to accelerate the solution of linear systems , 1991, The Journal of Supercomputing.

[306] Ehud D. Karnin,et al. A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.

[307] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[308] I. Guyon,et al. Handwritten digit recognition: applications of neural network chips and automatic learning , 1989, IEEE Communications Magazine.

[309] Janowsky,et al. Pruning versus clipping in neural networks. , 1989, Physical review. A, General physics.

[310] N. Takagi,et al. A high-speed multiplier using a redundant binary adder tree , 1987 .

[311] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[312] James E. Smith. Decoupled access/execute computer architectures , 1982, ISCA '98.

[313] Nicolas Halbwachs,et al. Automatic discovery of linear restraints among variables of a program , 1978, POPL.

[314] Leslie Lamport,et al. The parallel execution of DO loops , 1974, CACM.

[315] L. Chua. Memristor-The missing circuit element , 1971 .

[316] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[317] Richard M. Karp,et al. The Organization of Computations for Uniform Recurrence Equations , 1967, JACM.

[318] William F. Tinney,et al. Techniques for Exploiting the Sparsity or the Network Admittance Matrix , 1963 .

[319] J. Little. A Proof for the Queuing Formula: L = λW , 1961 .

[320] Mary Wootters,et al. The N3XT Approach to Energy-Efficient Abundant-Data Computing , 2019, Proceedings of the IEEE.

[321] Wafer-Scale Deep Learning , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).

[322] A. Parashar,et al. Stitch-X: An Accelerator Architecture for Exploiting Unstructured Sparsity in Deep Neural Networks , 2018 .

[323] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.

[324] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018 .

[325] J. Emer,et al. Understanding the Limitations of Existing Energy-Efficient Design Approaches for Deep Neural Networks , 2018 .

[326] Dirk Englund,et al. Deep learning with coherent nanophotonic circuits , 2017, 2017 Fifth Berkeley Symposium on Energy Efficient Electronic Systems & Steep Transistors Workshop (E3S).

[327] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[328] Tsu-Jae King Liu,et al. There's plenty of room at the top , 2017, 2017 IEEE 30th International Conference on Micro Electro Mechanical Systems (MEMS).

[329] Vivienne Sze,et al. Hardware for machine learning: Challenges and opportunities , 2017, 2017 IEEE Custom Integrated Circuits Conference (CICC).

[330] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[331] S. Simon Wong,et al. 24.2 A 2.5GHz 7.7TOPS/W switched-capacitor matrix multiplier with co-designed local memory in 40nm , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[332] Mathias Beike,et al. Digital Integrated Circuits A Design Perspective , 2016 .

[333] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[334] D. Ditzel,et al. Low-cost 3D chip stacking with ThruChip wireless connections , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).

[335] Endong Wang,et al. Intel Math Kernel Library , 2014 .

[336] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[337] Itu-T and Iso Iec Jtc. Advanced video coding for generic audiovisual services , 2010 .

[338] A. Krizhevsky. Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[339] Samuel Williams,et al. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures , 2008 .

[340] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[341] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[342] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[343] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[344] Jorge Herbert de Lira,et al. Two-Dimensional Signal and Image Processing , 1989 .

[345] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[346] Michael C. Mozer,et al. Using Relevance to Reduce Network Size Automatically , 1989 .

[347] Bernard Widrow,et al. Adaptive switching circuits , 1988 .

[348] S. Winograd. Arithmetic complexity of computations , 1980 .

[349] Xiaomei Yang. Rounding Errors in Algebraic Processes , 1964, Nature.