Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach
暂无分享,去创建一个
Vivek Sarkar | Hyoukjun Kwon | Tushar Krishna | Michael Pellauer | Angshuman Parashar | Prasanth Chatarasi | Vivek Sarkar | A. Parashar | Michael Pellauer | T. Krishna | Prasanth Chatarasi | Hyoukjun Kwon
[1] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[2] Yu Cao,et al. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks , 2017, FPGA.
[3] Jason Cong,et al. Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.
[4] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[5] Vivienne Sze,et al. Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks , 2018, ArXiv.
[6] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Vivienne Sze,et al. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[8] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[9] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[11] Mostafa Mahmoud,et al. Diffy: a Déjà vu-Free Differential Deep Neural Network Accelerator , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[12] Vivek Sarkar,et al. Oil and Water Can Mix: An Integration of Polyhedral and AST-Based Transformations , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Keshav Pingali,et al. Data-Centric Transformations for Locality Enhancement , 2001, International Journal of Parallel Programming.
[14] Xiaowei Li,et al. C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[15] Rajesh K. Gupta,et al. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[16] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[17] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[18] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[19] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[20] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.
[21] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[22] Vivek Sarkar,et al. An analytical model for loop tiling and its solution , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).
[23] Brucek Khailany,et al. Timeloop: A Systematic Approach to DNN Accelerator Evaluation , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[24] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .
[25] Vivek Sarkar,et al. Analytical Bounds for Optimal Tile Size Selection , 2012, CC.
[26] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[27] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[28] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[29] Benoît Meister,et al. Polyhedral Optimization of TensorFlow Computation Graphs , 2017, ESPT/VPA@SC.
[30] Jason Cong,et al. Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[31] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Uday Bondhugula,et al. Combined iterative and model-driven optimization in an automatic parallelization framework , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[33] Sriram Krishnamoorthy,et al. Analytical modeling of cache behavior for affine programs , 2018, Proc. ACM Program. Lang..
[34] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.
[35] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[36] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.
[37] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[38] Vivek Sarkar,et al. Automatic selection of high-order transformations in the IBM XL FORTRAN compilers , 1997, IBM J. Res. Dev..
[39] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[40] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[41] Keshav Pingali,et al. An experimental evaluation of tiling and shackling for memory hierarchy management , 1999, ICS '99.
[42] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[43] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[44] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[45] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[46] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).