CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
暂无分享,去创建一个
James Demmel | John Wawrzynek | Aravind Kalaiah | Qijing Huang | Yakun Sophia Shao | Grace Dinh | Minwoo Kang | Thomas Norell | J. Demmel | J. Wawrzynek | Grace Dinh | Y. Shao | A. Kalaiah | Qijing Huang | Minwoo Kang | Thomas Norell | Aravind Kalaiah
[1] Monica S. Lam,et al. Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..
[2] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[3] Timothy M. Jones,et al. Janus: Statically-Driven and Profile-Guided Automatic Dynamic Binary Parallelisation , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[4] Mingyu Gao,et al. Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators , 2018, ASPLOS.
[5] Rastislav Bodík,et al. Chlorophyll : Synthesis-Aided Compiler for Low-Power Spatial Architectures by Phitchaya Mangpo Phothilimthana , 2015 .
[6] Hyoukjun Kwon,et al. MAERI : Enabling Flexible Dataflow Mapping over DNN Accelerators via Programmable Interconnects , 2018 .
[7] Eric S. Chung,et al. A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[8] Rajeev Alur,et al. Search-based program synthesis , 2018, Commun. ACM.
[9] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[10] Kunle Olukotun,et al. Plasticine: A reconfigurable architecture for parallel patterns , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[12] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[13] A. Parashar,et al. Marvel: A Data-centric Compiler for DNN Operators on Spatial Accelerators , 2020 .
[14] Alexander Aiken,et al. Automatic generation of peephole superoptimizers , 2006, ASPLOS XII.
[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[17] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.
[18] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[19] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[20] Rajeev Barua,et al. Heterogeneous memory management for embedded systems , 2001, CASES '01.
[21] Hongbin Zheng,et al. Polly – Polyhedral optimization in LLVM , 2012 .
[22] Uday Bondhugula,et al. The Pluto+ Algorithm , 2016, ACM Trans. Program. Lang. Syst..
[23] David Cox,et al. Triton: an intermediate language and compiler for tiled neural network computations , 2019, MAPL@PLDI.
[24] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[25] Vikas Chandra,et al. Mind mappings: enabling efficient algorithm-accelerator mapping space search , 2021, ASPLOS.
[26] Christopher Torng,et al. INVITED: A Modular Digital VLSI Flow for High-Productivity SoC Design , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[27] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.
[28] An Wang,et al. Swizzle Inventor: Data Movement Synthesis for GPU Kernels , 2019, ASPLOS.
[29] Sheng-Chun Kao,et al. GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm , 2020, 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).
[30] Jian Weng,et al. Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign , 2018, PACT.
[31] Jason Cong,et al. An efficient and versatile scheduling algorithm based on SDC formulation , 2006, 2006 43rd ACM/IEEE Design Automation Conference.
[32] Venkatesh Akella,et al. AutoTM: Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming , 2020, ASPLOS.
[33] Jae-Gon Lee,et al. 7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[34] Shoaib Kamil,et al. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code , 2018, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[35] William J. Dally,et al. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture , 2019, MICRO.
[36] Jonathan Ragan-Kelley,et al. Automatically scheduling halide image processing pipelines , 2016, ACM Trans. Graph..
[37] Brucek Khailany,et al. Timeloop: A Systematic Approach to DNN Accelerator Evaluation , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[38] Karthikeyan Sankaralingam,et al. A general constraint-centric scheduling framework for spatial architectures , 2013, PLDI.
[39] Lawrence D. Jackel,et al. Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car , 2017, ArXiv.
[40] S. Alexander Chin,et al. An Architecture-Agnostic Integer Linear Programming Approach to CGRA Mapping , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[41] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[42] Christoforos E. Kozyrakis,et al. TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators , 2019, ASPLOS.
[43] James Demmel,et al. Communication-Optimal Tilings for Projective Nested Loops with Arbitrary Bounds , 2020, SPAA.
[44] Carole-Jean Wu,et al. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[45] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[46] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[47] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.
[48] Ilya Levin,et al. Self-checking of FPGA-based control units , 1999, Proceedings Ninth Great Lakes Symposium on VLSI.
[49] Cédric Bastoul,et al. Predictive Modeling in a Polyhedral Optimization Space , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[50] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[51] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[52] Pradeep Dubey,et al. SCALEDEEP: A scalable compute architecture for learning and evaluating deep networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[53] Aviral Shrivastava,et al. dMazeRunner , 2019, ACM Trans. Embed. Comput. Syst..
[54] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[55] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[56] Richard Veras,et al. When polyhedral transformations meet SIMD code generation , 2013, PLDI.
[57] Bruce Jacob,et al. DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.
[58] Elnar Hajiyev,et al. PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[59] Glenn Henry,et al. High-Performance Deep-Learning Coprocessor Integrated into x86 SoC with Server-Class CPUs Industrial Product , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[60] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[61] Armando Solar-Lezama,et al. Programming by sketching for bit-streaming programs , 2005, PLDI '05.
[62] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Frédo Durand,et al. Learning to optimize halide with tree search and random programs , 2019, ACM Trans. Graph..
[65] Alexander Aiken,et al. Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.
[66] Dipankar Das,et al. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).