论文信息 - Machine Learning Systems are Stuck in a Rut - 字舞流文

Machine Learning Systems are Stuck in a Rut

In this paper we argue that systems for numerical computing are stuck in a local basin of performance and programmability. Systems researchers are doing an excellent job improving the performance of 5-year-old benchmarks, but gradually making it harder to explore innovative machine learning research ideas. We explain how the evolution of hardware accelerators favors compiler back ends that hyper-optimize large monolithic kernels, show how this reliance on high-performance but inflexible kernels reinforces the dominant style of programming model, and argue these programming abstractions lack expressiveness, maintainability, and modularity; all of which hinders research progress. We conclude by noting promising directions in the field, and advocate steps to advance progress towards high-performance general purpose numerical computing systems on modern accelerators.

Paul Barham | Michael Isard | P. Barham | M. Isard

[1] Kenneth R. Gold. APL: A Programming Language. , 1970 .

[2] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[4] 김종영. 구글 TensorFlow 소개 , 2015 .

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[7] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[8] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[9] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.

[10] Alan Edelman,et al. Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[11] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.

[12] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.

[13] Geoffrey E. Hinton,et al. Matrix capsules with EM routing , 2018, ICLR.

[14] Alexander Heinecke,et al. Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[15] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.

[16] Matei Zaharia,et al. Optimizing DNN Computation with Relaxed Graph Substitutions , 2019, MLSys.

[17] Frédo Durand,et al. Learning to optimize halide with tree search and random programs , 2019, ACM Trans. Graph..

[18] Alexander Aiken,et al. Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.

[19] S. Sagar Imambi,et al. PyTorch , 2021, Programming with TensorFlow.