暂无分享,去创建一个
Andreas Moshovos | Gennady Pekhimenko | Mostafa Mahmoud | Ciaran Bannon | Isak Edo Vivancos | Ali Hadi Zadeh | Omar Mohamed Awad | Anand Jayarajan | Gennady Pekhimenko | A. Moshovos | M. Mahmoud | Anand Jayarajan | Ciaran Bannon | Mostafa Mahmoud
[1] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[2] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[3] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..
[4] Dongyoung Kim,et al. ZeNA: Zero-Aware Neural Network Accelerator , 2018, IEEE Design & Test.
[5] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[6] Khalil Sima'an,et al. Multi30K: Multilingual English-German Image Descriptions , 2016, VL@ACL.
[7] Eunhyeok Park,et al. Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[8] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.
[9] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.
[10] J. Ticehurst. Cacti , 1983 .
[11] Dylan Malone Stuart,et al. Laconic Deep Learning Inference Acceleration , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[12] Kunle Olukotun,et al. High-Accuracy Low-Precision Training , 2018, ArXiv.
[13] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[14] Mark Horowitz,et al. Energy-Efficient Floating-Point Unit Design , 2011, IEEE Transactions on Computers.
[15] David Blaauw,et al. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[16] Swagath Venkataramani,et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.
[17] Xin Wang,et al. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.
[18] Hoi-Jun Yoo,et al. UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision , 2019, IEEE Journal of Solid-State Circuits.
[19] A KonstanJoseph,et al. The MovieLens Datasets , 2015 .
[20] Xuehai Qian,et al. HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[21] Xu Sun,et al. Memorized Sparse Backpropagation , 2019, Neurocomputing.
[22] Charbel Sakr,et al. Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks , 2019, ICLR.
[23] Amar Phanishayee,et al. Gist: Efficient Data Encoding for Deep Neural Network Training , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[24] Pran Kurup,et al. Logic synthesis using Synopsys (2nd ed.) , 1997 .
[25] Jitesh R. Shinde,et al. VLSI implementation of bit serial architecture based multiplier in floating point arithmetic , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).
[26] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[27] Alexander M. Rush,et al. MASR: A Modular Accelerator for Sparse RNNs , 2019, 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[28] Jyh-Charn Liu,et al. Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training , 2019, International Journal on Document Analysis and Recognition (IJDAR).
[29] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[30] Shaoli Liu,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[31] Cordelia Schmid,et al. End-to-End Incremental Learning , 2018, ECCV.
[32] Pradeep Dubey,et al. Mixed Precision Training of Convolutional Neural Networks using Integer Operations , 2018, ICLR.
[33] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[34] Yuxin Peng,et al. Error-Driven Incremental Learning in Deep Convolutional Neural Network for Large-Scale Image Classification , 2014, ACM Multimedia.
[35] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[36] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.
[37] Onur Mutlu,et al. Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[38] Constantine Bekas,et al. Incremental Training of Deep Convolutional Neural Networks , 2018, AutoML@PKDD/ECML.
[39] Tianshi Chen,et al. Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[40] Stephen W. Keckler,et al. Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[41] H. T. Kung,et al. Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization , 2018, ASPLOS.
[42] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[43] Andreas Moshovos,et al. Bit-Pragmatic Deep Neural Network Computing , 2016, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[44] Walter H. Ku,et al. A bit-serial floating-point complex multiplier-accumulator for fault-tolerant digital signal processing arrays , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[45] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[46] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[47] Anand Jayarajan,et al. Priority-based parameter propagation for distributed deep neural network training , 2019 .
[48] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[49] Pradeep Dubey,et al. SCALEDEEP: A scalable compute architecture for learning and evaluating deep networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[50] Xin Wang,et al. Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization , 2019, ICML.
[51] Peter W. Cook,et al. Second-generation RISC floating point with multiply-add fused , 1990 .
[52] David Goldberg,et al. What every computer scientist should know about floating-point arithmetic , 1991, CSUR.
[53] F. Maxwell Harper,et al. The MovieLens Datasets: History and Context , 2016, TIIS.
[54] Patrick Judd,et al. Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks , 2017, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[55] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[56] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.
[57] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[58] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[59] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.
[60] Dipankar Das,et al. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[61] Alexander Heinecke,et al. Leveraging the bfloat16 Artificial Intelligence Datatype For Higher-Precision Computations , 2019, 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH).
[62] Patrick Judd,et al. ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning , 2019, MICRO.
[63] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[64] Xu Sun,et al. meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting , 2017, ICML.
[65] Hadi Esmaeilzadeh,et al. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[66] Sangeetha Abdu Jyothi,et al. Communication Scheduling as a First-Class Citizen in Distributed Machine Learning Systems , 2018, ArXiv.
[67] Babak Falsafi,et al. Training DNNs with Hybrid Block Floating Point , 2018, NeurIPS.
[68] Mengjia Yan,et al. UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[69] Patrick Judd,et al. Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks , 2019, ASPLOS.
[70] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[71] Engin Ipek,et al. Enabling Scientific Computing on Memristive Accelerators , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[72] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[73] T. N. Vijaykumar,et al. SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks , 2019, MICRO.
[74] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[75] Sangeetha Abdu Jyothi,et al. TicTac: Accelerating Distributed Deep Learning with Communication Scheduling , 2018, MLSys.