WACO: Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program
暂无分享,去创建一个
[1] Clayton D. Scott,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] James Demmel,et al. CoSA: Scheduling by Constrained Optimization for Spatial Accelerators , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[3] Karima Benatchba,et al. A Deep Learning Based Cost Model for Automatic Code Optimization , 2021, MLSys.
[4] Vikas Chandra,et al. Mind mappings: enabling efficient algorithm-accelerator mapping space search , 2021, ASPLOS.
[5] Niladrish Chatterjee,et al. Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures , 2021, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[6] Shoaib Kamil,et al. A sparse iteration space transformation framework for sparse tensor algebra , 2020, Proc. ACM Program. Lang..
[7] L. Gan,et al. SpTFS: Sparse Tensor Format Selection for MTTKRP via Deep Learning , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Yu Wang,et al. GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Erich Elsen,et al. Sparse GPU Kernels for Deep Learning , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] V. Sze,et al. Efficient Processing of Deep Neural Networks , 2020, Synthesis Lectures on Computer Architecture.
[11] Cody Hao Yu,et al. Ansor : Generating High-Performance Tensor Programs for Deep Learning , 2020, OSDI.
[12] Yun Liang,et al. FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System , 2020, ASPLOS.
[13] Shulong Tan,et al. Fast Item Ranking under Neural Network based Measures , 2020, WSDM.
[14] Wang Chen,et al. Enabling Runtime SpMV Format Selection through an Overhead Conscious Method , 2020, IEEE Transactions on Parallel and Distributed Systems.
[15] Alexander Aiken,et al. TASO: optimizing deep learning computation with automatic generation of graph substitutions , 2019, SOSP.
[16] Frédo Durand,et al. Learning to optimize halide with tree search and random programs , 2019, ACM Trans. Graph..
[17] Cesare Alippi,et al. Spectral Clustering with Graph Neural Networks for Graph Pooling , 2019, ICML.
[18] Heidi K. Thornquist,et al. Polynomial Preconditioned GMRES to Reduce Communication in Parallel Computing , 2019, ArXiv.
[19] Brucek Khailany,et al. Timeloop: A Systematic Approach to DNN Accelerator Evaluation , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[20] P. Sadayappan,et al. Adaptive sparse tiling for sparse matrix multiplication , 2019, PPoPP.
[21] Michael Carbin,et al. Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks , 2018, ICML.
[22] Thierry Moreau,et al. Learning to Optimize Tensor Programs , 2018, NeurIPS.
[23] Yue Zhao,et al. Overhead-Conscious Format Selection for SpMV-Based Applications , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[24] Shoaib Kamil,et al. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code , 2018, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[25] Saman P. Amarasinghe,et al. Format abstraction for sparse tensor algebra compilers , 2018, Proc. ACM Program. Lang..
[26] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[27] Yue Zhao,et al. Bridging the gap between deep learning and sparse matrix format selection , 2018, PPoPP.
[28] Jonathan Ragan-Kelley,et al. Halide , 2017 .
[29] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..
[30] Laurens van der Maaten,et al. Submanifold Sparse Convolutional Networks , 2017, ArXiv.
[31] Xuemin Lin,et al. Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement , 2016, IEEE Transactions on Knowledge and Data Engineering.
[32] Jonathan Ragan-Kelley,et al. Automatically scheduling halide image processing pipelines , 2016, ACM Trans. Graph..
[33] Franz Franchetti,et al. Mathematical foundations of the GraphBLAS , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).
[34] Wojciech Matusik,et al. Simit , 2016, ACM Trans. Graph..
[35] Yury A. Malkov,et al. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[36] Anders Logg,et al. The FEniCS Project Version 1.5 , 2015 .
[37] Srinivasan Parthasarathy,et al. Automatic Selection of Sparse Matrix Representation on GPUs , 2015, ICS.
[38] Mary W. Hall,et al. Loop and data transformations for sparse matrix code , 2015, PLDI.
[39] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[40] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[41] David D. Cox,et al. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.
[42] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI.
[43] Ninghui Sun,et al. SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication , 2013, PLDI.
[44] Xing Liu,et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.
[45] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[46] Timothy A. Davis,et al. Algorithm 915, SuiteSparseQR: Multifrontal multithreaded rank-revealing sparse QR factorization , 2011, TOMS.
[47] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[48] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.
[49] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[50] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[51] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[52] Sameer A. Nene,et al. A simple algorithm for nearest neighbor search in high dimensions , 1997 .
[53] Samuel J. Kaufman,et al. Learned TPU Cost Model for XLA Tensor Programs , 2019 .