Flexible Multiple-Precision Fused Arithmetic Units for Efficient Deep Learning Computation
暂无分享,去创建一个
[1] Song Han,et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.
[2] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[3] Julien Langou,et al. Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems , 2007, Int. J. High Perform. Comput. Appl..
[4] Yen-Cheng Kuan,et al. A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.
[5] Gary Wayne Bewick. Fast Multiplication: Algorithms and Implementations , 1994 .
[6] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[7] Kevin J. Nowka,et al. Leading zero anticipation and detection-a comparison of methods , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.
[8] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[9] Hayden Kwok-Hay So,et al. Architecture Generator for Type-3 Unum Posit Adder/Subtractor , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).
[10] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[11] Metin Mete Özbilen,et al. Multi-functional floating-point MAF designs with dot product support , 2008, Microelectron. J..
[12] Tajana Simunic,et al. CFPU: Configurable floating point multiplier for energy-efficient computing , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[13] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] Natalie D. Enright Jerger,et al. Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets , 2015, ArXiv.
[15] Soheil Ghiasi,et al. Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[16] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[17] Andrew C. Ling,et al. An OpenCL(TM) Deep Learning Accelerator on Arria 10 , 2017 .
[18] Dionysios I. Reisis,et al. An efficient dual-mode floating-point Multiply-Add Fused Unit , 2010, 2010 17th IEEE International Conference on Electronics, Circuits and Systems.
[19] Julien Langou,et al. Accelerating scientific computations with mixed precision algorithms , 2008, Comput. Phys. Commun..
[20] Yu Cao,et al. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks , 2017, FPGA.
[21] Tianshi Chen,et al. DaDianNao: A Neural Network Supercomputer , 2017, IEEE Transactions on Computers.
[22] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[23] Eduard Ayguadé,et al. Low-Precision Floating-Point Schemes for Neural Network Training , 2018, ArXiv.
[24] Asha Anoosheh,et al. Efficient floating point precision tuning for approximate computing , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).
[25] Seok-Bum Ko,et al. Efficient Multiple-Precision Floating-Point Fused Multiply-Add with Mixed-Precision Support , 2019, IEEE Transactions on Computers.
[26] Ki-Seok Chung,et al. HMC-MAC: Processing-in Memory Architecture for Multiply-Accumulate Operations with Hybrid Memory Cube , 2018, IEEE Computer Architecture Letters.
[27] Vikas Chandra,et al. Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations , 2017, ArXiv.
[28] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[29] Daisuke Miyashita,et al. Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.
[30] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.
[31] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[32] Nader Bagherzadeh,et al. Efficient Mitchell’s Approximate Log Multipliers for Convolutional Neural Networks , 2019, IEEE Transactions on Computers.
[33] Florent de Dinechin,et al. A mixed-precision fused multiply and add , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).
[34] Andrew D. Booth,et al. A SIGNED BINARY MULTIPLICATION TECHNIQUE , 1951 .
[35] Yann LeCun,et al. 1.1 Deep Learning Hardware: Past, Present, and Future , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[36] Alex Noel Joseph Raj,et al. Efficient dual-precision floating-point fused-multiply-add architecture , 2018, Microprocess. Microsystems.
[37] Alberto Nannarelli. Tunable Floating-Point for Energy Efficient Accelerators , 2018, 2018 IEEE 25th Symposium on Computer Arithmetic (ARITH).
[38] Nicholas J. Higham,et al. Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions , 2018, SIAM J. Sci. Comput..
[39] Dongdong Chen,et al. Area- and power-efficient iterative single/double-precision merged floating-point multiplier on FPGA , 2017, IET Comput. Digit. Tech..
[40] Neil Burgess,et al. Design of the ARM VFP11 Divide and Square Root Synthesisable Macrocell , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).
[41] P.-M. Seidel. Multiple path IEEE floating-point fused multiply-add , 2003, 2003 46th Midwest Symposium on Circuits and Systems.
[42] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.
[43] G Vamshi Krishna,et al. Floating-Point Butterfly Architecture Based On Binary Signed-Digit Representation , 2018 .
[44] Javier D. Bruguera,et al. Floating-point multiply-add-fused with reduced latency , 2004, IEEE Transactions on Computers.
[45] Sebastian Thrun,et al. Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.
[46] E.E. Swartzlander,et al. Floating-Point Fused Multiply-Add Architectures , 2007, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers.
[47] Eric M. Schwarz,et al. FPU implementations with denormalized numbers , 2005, IEEE Transactions on Computers.
[48] Seok-Bum Ko,et al. Improved Hybrid Memory Cube for Weight-Sharing Deep Convolutional Neural Networks , 2019, 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS).
[49] David Gregg,et al. Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks , 2016, IEEE Computer Architecture Letters.
[50] Xin Wang,et al. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.
[51] Erdem Hokenek,et al. Design of the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..
[52] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[53] Jean-Michel Muller,et al. Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .
[54] Dhireesha Kudithipudi,et al. Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit , 2018, 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2).
[55] Dionysios I. Reisis,et al. An efficient multiple precision floating-point Multiply-Add Fused unit , 2016, Microelectron. J..
[56] Li Shen,et al. A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).
[57] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[58] Hao Zhang,et al. Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).
[59] Moon Ho Lee,et al. Performance Analysis of Bit-Width Reduced Floating-Point Arithmetic Units in FPGAs: A Case Study of Neural Network-Based Face Detector , 2009, EURASIP J. Embed. Syst..
[60] Nicholas J. Higham,et al. Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[61] Nicolas Brunie,et al. Modified Fused Multiply and Add for Exact Low Precision Product Accumulation , 2017, 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH).
[62] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[63] Nong Xiao,et al. Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support , 2012, IEEE Transactions on Computers.
[64] Román Hermida,et al. Ultra-low-power adder stage design for exascale floating point units , 2014, ACM Trans. Embed. Comput. Syst..
[65] Earl E. Swartzlander,et al. Bridge Floating-Point Fused Multiply-Add Design , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[66] Javier D. Bruguera,et al. Floating-point fused multiply-add: reduced latency for floating-point addition , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).
[67] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[68] M. Saunders,et al. Solving Multiscale Linear Programs Using the Simplex Method in Quadruple Precision , 2015 .
[69] Hao Zhang,et al. New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and Inference , 2020, IEEE Transactions on Computers.
[70] Evangelos Eleftheriou,et al. Mixed-precision training of deep neural networks using computational memory , 2017, ArXiv.
[71] Andrew S. Cassidy,et al. A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.
[72] Satoshi Matsuoka,et al. Hardware Implementation of POSITs and Their Application in FPGAs , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[73] Luca Benini,et al. FlexFloat: A Software Library for Transprecision Computing , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[74] Javier D. Bruguera,et al. Leading-One Prediction with Concurrent Position Correction , 1999, IEEE Trans. Computers.
[75] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[76] Jeff Johnson,et al. Rethinking floating point for deep learning , 2018, ArXiv.
[77] Zhaoxia Deng,et al. Reduced-Precision Memory Value Approximation for Deep Learning , 2015 .
[78] Michael J. Schulte,et al. Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support , 2009, IEEE Transactions on Computers.
[79] Ki-Seok Chung,et al. CasHMC: A Cycle-Accurate Simulator for Hybrid Memory Cube , 2017, IEEE Computer Architecture Letters.
[80] John L. Gustafson,et al. Beating Floating Point at its Own Game: Posit Arithmetic , 2017, Supercomput. Front. Innov..
[81] Shengen Yan,et al. Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[82] Seok-Bum Ko,et al. Efficient Posit Multiply-Accumulate Unit Generator for Deep Learning Applications , 2019, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).
[83] Yu Cao,et al. Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[84] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[85] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.