3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low Bitwidth Quantization, and Ultra-Low Latency Acceleration
暂无分享,去创建一个
Cong Hao | Cole Hawkins | Zheng Zhang | Kaiqi Zhang | Yao Chen | Cole Hawkins | Kaiqi Zhang | Yao Chen | Cong Hao | Zheng Zhang
[1] Jinjun Xiong,et al. FPGA/DNN Co-Design: An Efficient Design Methodology for 1oT Intelligence on the Edge , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[2] Hiroki Nakahara,et al. A fully connected layer elimination for a binarizec convolutional neural network on an FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[3] Wonyong Sung,et al. FPGA based implementation of deep neural networks using on-chip memory only , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.
[5] Shenghuo Zhu,et al. Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM , 2017, AAAI.
[6] Hongjun Wang,et al. Real-Time Object Tracking System on FPGAs , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.
[7] Alexander Novikov,et al. Tensorizing Neural Networks , 2015, NIPS.
[8] Ivan Oseledets,et al. Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..
[9] Zheng Zhang,et al. Bayesian Tensorized Neural Networks with Automatic Rank Selection , 2019, Neurocomputing.
[10] Bin Liu,et al. Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Nicholas Caldwell,et al. Scalable high-performance architecture for convolutional ternary neural networks on FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[12] Satoshi Nakamura,et al. Compressing recurrent neural network with tensor train , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).
[13] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[14] Philip Heng Wai Leong,et al. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.
[15] Luca Benini,et al. Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Codesign , 2021, IEEE Design & Test.
[16] Kai Zhang,et al. T-DLA: An Open-source Deep Learning Accelerator for Ternarized DNN Models on Embedded FPGA , 2019, 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).
[17] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..
[18] Danilo P. Mandic,et al. Tucker Tensor Layer in Fully Connected Neural Networks , 2019, ArXiv.
[19] H. T. Kung,et al. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).
[20] Soheil Ghiasi,et al. Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.
[21] Valentin Khrulkov,et al. Tensorized Embedding Layers for Efficient Model Compression , 2019, ArXiv.
[22] Christopher J. Hillar,et al. Most Tensor Problems Are NP-Hard , 2009, JACM.
[23] Yang Liu,et al. Two-Step Quantization for Low-bit Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[24] Jinjun Xiong,et al. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[25] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[26] Bo Yuan,et al. Compressing Recurrent Neural Networks Using Hierarchical Tucker Tensor Decomposition , 2020, ArXiv.
[27] Deming Chen,et al. µL2Q: An Ultra-Low Loss Quantization Method for DNN Compression , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).
[28] Cole Hawkins,et al. On-FPGA Training with Ultra Memory Reduction: A Low-Precision Tensor Method , 2021, ArXiv.
[29] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[30] Yao Chen,et al. Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs , 2019, FPGA.
[31] Yue Wang,et al. E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings , 2019, NeurIPS.
[32] Alexander Novikov,et al. Ultimate tensorization: compressing convolutional and FC layers alike , 2016, ArXiv.
[33] Tao Li,et al. VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization , 2020, IEEE Transactions on Computers.
[34] Ivan V. Oseledets,et al. Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.
[35] Di He,et al. Machine learning on FPGAs to face the IoT revolution , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[36] Ce Zhu,et al. Tensor rank learning in CP decomposition via convolutional neural network , 2019, Signal Process. Image Commun..
[37] Frédéric Pétrot,et al. Ternary neural networks for resource-efficient AI applications , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).
[38] Rajesh Gupta,et al. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.
[39] Yoshua Bengio,et al. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.
[40] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[41] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..
[42] Zheng Zhang,et al. Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination , 2020, SIAM J. Math. Data Sci..
[43] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.