The Deep Learning Compiler: A Comprehensive Survey
暂无分享,去创建一个
Xiaoyan Liu | Hailong Yang | Mingzhen Li | Yi Liu | Qingxiao Sun | Xin You | Zhongzhi Luan | Depei Qian | D. Qian | Hailong Yang | Zhongzhi Luan | Yi Liu | Xin You | Mingzhen Li | Xiaoyan Liu | Qingxiao Sun
[1] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[2] Alan Edelman,et al. Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..
[3] G. Shipman,et al. Omega Library , 2011, Encyclopedia of Parallel Computing.
[4] Chun Chen,et al. Polyhedra scanning revisited , 2012, PLDI.
[5] David E. Goldberg,et al. Genetic algorithms and Machine Learning , 1988, Machine Learning.
[6] Razvan Pascanu,et al. Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.
[7] Philip S. Yu,et al. A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[8] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[9] P. Feautrier. Parametric integer programming , 1988 .
[10] Ashish Agarwal,et al. TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning , 2019, SysML.
[11] Sven Verdoolaege. Counting Affine Calculator and Applications , 2011 .
[12] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[13] Zheng Wang,et al. Machine Learning in Compiler Optimization , 2018, Proceedings of the IEEE.
[14] Bart van Merrienboer,et al. Automatic differentiation in ML: Where we are and where we should be going , 2018, NeurIPS.
[15] Madhumitha Nara,et al. Performance Evaluation of Deep Learning frameworks on Computer Vision problems , 2019, 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI).
[16] Xi Chen,et al. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[17] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.
[18] Jie Xu,et al. DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[19] John McCarthy,et al. LISP 1.5 Programmer's Manual , 1962 .
[20] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[21] Uday Bondhugula,et al. MLIR: A Compiler Infrastructure for the End of Moore's Law , 2020, ArXiv.
[22] Albert Cohen,et al. Polyhedral Code Generation in the Real World , 2006, CC.
[23] Tim Zerrell,et al. Stripe: Tensor Compilation via the Nested Polyhedral Model , 2019, ArXiv.
[24] Gu-Yeon Wei,et al. Benchmarking TPU, GPU, and CPU Platforms for Deep Learning , 2019, ArXiv.
[25] Thierry Moreau,et al. Learning to Optimize Tensor Programs , 2018, NeurIPS.
[26] Yu Wang,et al. Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[27] Yu Cao,et al. ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler , 2018, Integr..
[28] Seung-Jong Park,et al. Evaluation of Deep Learning Frameworks Over Different HPC Architectures , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).
[29] Jing Xia,et al. DaVinci: A Scalable Architecture for Neural Network Computing , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).
[30] Thierry Moreau,et al. Relay: A High-Level Compiler for Deep Learning , 2019 .
[31] Li Li,et al. An Orchestrated Empirical Study on Deep Learning Frameworks and Platforms , 2018, ArXiv.
[32] Samuel J. Kaufman,et al. Learned TPU Cost Model for XLA Tensor Programs , 2019 .
[33] Thierry Moreau,et al. A Hardware–Software Blueprint for Flexible Deep Learning Specialization , 2018, IEEE Micro.
[34] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[35] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[36] Jon Harrop. F# for Scientists , 2008 .
[37] Peter Rossmanith,et al. Simulated Annealing , 2008, Taschenbuch der Algorithmen.
[38] Martín Abadi,et al. Dynamic control flow in large-scale machine learning , 2018, EuroSys.
[39] Kathryn A. Dowsland,et al. Simulated Annealing , 1989, Encyclopedia of GIS.
[40] Jonathan Rees,et al. Revised3 report on the algorithmic language scheme , 1986, SIGP.
[41] Hailong Yang,et al. Privacy for Rescue: A New Testimony Why Privacy is Vulnerable In Deep Models , 2019, ArXiv.
[42] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[43] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .
[44] Mary W. Hall,et al. Loop and data transformations for sparse matrix code , 2015, PLDI.
[45] Daniele Paolo Scarpazza,et al. Dissecting the Graphcore IPU Architecture via Microbenchmarking , 2019, ArXiv.
[46] Wei Lin,et al. FusionStitching: Boosting Execution Efficiency of Memory Intensive Computations for DL Workloads , 2019, ArXiv.
[47] Daniel Goodman,et al. JavaScript Bible , 1996 .
[48] David A. Padua,et al. Dependence graphs and compiler optimizations , 1981, POPL '81.
[49] Jean Ponce,et al. Computer Vision: A Modern Approach , 2002 .
[50] Wei Yu,et al. A Survey of Deep Learning: Platforms, Applications and Emerging Research Trends , 2018, IEEE Access.
[51] Jian Weng,et al. An In-depth Comparison of Compilers for Deep Neural Networks on Hardware , 2019, 2019 IEEE International Conference on Embedded Software and Systems (ICESS).
[52] R. Kent Dybvig,et al. Revised5 Report on the Algorithmic Language Scheme , 1986, SIGP.
[53] Yu Wang,et al. A Survey of FPGA-Based Neural Network Accelerator , 2017, 1712.08934.
[54] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[55] Kenta Oono,et al. Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .
[56] Amit Agarwal,et al. CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.
[57] Barak A. Pearlmutter,et al. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator , 2008, TOPL.
[58] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[59] Oliver Schulte,et al. The CTU Prague Relational Learning Repository , 2015, ArXiv.
[60] Jeonghee Kim,et al. Large-Scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks , 2016, KDD.
[61] Roberto Bagnara,et al. The Parma Polyhedra Library: Toward a complete set of numerical abstractions for the analysis and verification of hardware and software systems , 2006, Sci. Comput. Program..
[62] Thomas Blaschke,et al. The rise of deep learning in drug discovery. , 2018, Drug discovery today.
[63] Bertrand A. Maher,et al. Glow: Graph Lowering Compiler Techniques for Neural Networks , 2018, ArXiv.
[64] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[65] Thierry Moreau,et al. Graph Optimizer Tensor Optimizer VTA JIT Runtime VTA ISA VTA MicroArchitecture , 2018 .
[66] Dong Han,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[67] Mohak Shah,et al. Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning , 2015, ArXiv.
[68] Michael Innes,et al. Fashionable Modelling with Flux , 2018, ArXiv.
[69] Christos-Savvas Bouganis,et al. fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[70] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[71] Mary W. Hall,et al. Non-affine Extensions to Polyhedral Code Generation , 2014, CGO '14.
[72] Jun Yang,et al. FusionStitching: Deep Fusion and Code Generation for Tensorflow Computations on GPUs , 2018, ArXiv.
[73] Mohak Shah,et al. Comparative Study of Deep Learning Software Frameworks , 2015, 1511.06435.
[74] Sven Verdoolaege,et al. isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.
[75] Mark N. Wegman,et al. Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.
[76] Bart van Merriënboer,et al. Automatic Differentiation in Myia , 2017 .
[77] Jinwon Lee,et al. Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices , 2020, MLSys.
[78] Wayne Luk,et al. Hardware Compilation of Deep Neural Networks: An Overview , 2018, 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[79] Zhiyuan Liu,et al. Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.
[80] Vincent Loechner. PolyLib: A Library for Manipulating Parameterized Polyhedra , 1999 .
[81] Hamed Haddadi,et al. Deep Private-Feature Extraction , 2018, IEEE Transactions on Knowledge and Data Engineering.
[82] Pradeep Dubey,et al. A Study of BFLOAT16 for Deep Learning Training , 2019, ArXiv.
[83] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[84] Elliot Saba,et al. Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs , 2018, ArXiv.
[85] Yong Dou,et al. Automatic code generation of convolutional neural networks in FPGA implementation , 2016, 2016 International Conference on Field-Programmable Technology (FPT).
[86] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.
[87] Robert Hieb,et al. Revised 5 Report on the Algorithmic Language , 1999 .
[88] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.
[89] Hadi Esmaeilzadeh,et al. Shredder: Learning Noise Distributions to Protect Inference Privacy , 2020, ASPLOS.
[90] George Karypis,et al. Tensor-matrix products with a compressed sparse tensor , 2015, IA3@SC.
[91] Takuya Akiba,et al. Chainer: A Deep Learning Framework for Accelerating the Research Cycle , 2019, KDD.
[92] Peng Zhang,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[93] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[94] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[95] M. Pelcat,et al. Tactics to Directly Map CNN Graphs on Embedded FPGAs , 2017, IEEE Embedded Systems Letters.
[96] BouganisChristos-Savvas,et al. Toolflows for Mapping Convolutional Neural Networks on FPGAs , 2018 .
[97] Soonhoi Ha,et al. C-GOOD: C-code Generation Framework for Optimized On-device Deep Learning , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[98] Rubén D. Fonnegra,et al. Performance comparison of deep learning frameworks in image classification problems using convolutional and recurrent networks , 2017, 2017 IEEE Colombian Conference on Communications and Computing (COLCOM).
[99] Yida Wang,et al. Optimizing CNN Model Inference on CPUs , 2018, USENIX Annual Technical Conference.
[100] Mohsen Guizani,et al. Semisupervised Deep Reinforcement Learning in Support of IoT and Smart City Services , 2018, IEEE Internet of Things Journal.
[101] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.
[102] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[103] Jure Leskovec,et al. How Powerful are Graph Neural Networks? , 2018, ICLR.
[104] Hadi Esmaeilzadeh,et al. Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation , 2020, ICLR.