A Sparse Tensor Benchmark Suite for CPUs and GPUs
暂无分享,去创建一个
Jiajia Li | Ang Li | Xiaolong Wu | Kevin Barker | Catherine Olschanowsky | Mahesh Lakshminarasimhan | Ang Li | K. Barker | C. Olschanowsky | Jiajia Li | M. Lakshminarasimhan | Xiaolong Wu
[1] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..
[2] Jimeng Sun,et al. HiCOO: Hierarchical Storage of Sparse Tensors , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] David A. Patterson,et al. The GAP Benchmark Suite , 2015, ArXiv.
[4] Fei Wang,et al. SPARTan: Scalable PARAFAC2 for Large & Sparse Data , 2017, KDD.
[5] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[6] Paolo Bientinesi,et al. HPTT: a high-performance tensor transposition C++ library , 2017, ARRAY@PLDI.
[7] Andrzej Cichocki,et al. Era of Big Data Processing: A New Approach via Tensor Networks and Tensor Decompositions , 2014, ArXiv.
[8] J. Chang,et al. Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .
[9] Kaivalya M. Dixit,et al. The SPEC benchmarks , 1991, Parallel Comput..
[10] J. Kruskal,et al. Candelinc: A general approach to multidimensional analysis of many-way arrays with linear constraints on parameters , 1980 .
[11] Hadi Fanaee-T,et al. SimTensor: A synthetic tensor data generator , 2016, ArXiv.
[12] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..
[13] Steve Plimpton,et al. FireHose Streaming Benchmarks , 2015 .
[14] Jiajia Li. Scalable tensor decompositions in high performance computing environments , 2018 .
[15] Andrzej Cichocki,et al. Low-Rank Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Problems: Perspectives and Challenges PART 1 , 2016, ArXiv.
[16] Jesús Labarta,et al. A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[17] Devin Matthews,et al. High-Performance Tensor Contraction without BLAS , 2016, ArXiv.
[18] Jimeng Sun,et al. Efficient and effective sparse tensor reordering , 2019, ICS.
[19] Srinivasan Parthasarathy,et al. Automatic Selection of Sparse Matrix Representation on GPUs , 2015, ICS.
[20] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[21] Brian W. Barrett,et al. Introducing the Graph 500 , 2010 .
[22] Mingyu Chen,et al. Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning , 2017, PPoPP.
[23] Bora Uçar,et al. Parallel Candecomp/Parafac Decomposition of Sparse Tensors Using Dimension Trees , 2018, SIAM J. Sci. Comput..
[24] Richard A. Lethin,et al. Highly Scalable Near Memory Processing with Migrating Threads on the Emu System Architecture , 2016, 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3).
[25] Richard W. Vuduc,et al. An Initial Characterization of the Emu Chick , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[26] Alexander Novikov,et al. Tensorizing Neural Networks , 2015, NIPS.
[27] Benoît Meister,et al. Efficient and scalable computations with sparse tensors , 2012, 2012 IEEE Conference on High Performance Extreme Computing.
[28] Nikos D. Sidiropoulos,et al. SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[29] Nikos D. Sidiropoulos,et al. Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.
[30] Edoardo Di Napoli,et al. Towards an efficient use of the BLAS library for multilinear tensor contractions , 2013, Appl. Math. Comput..
[31] Anand D. Sarwate,et al. A Unified Optimization Approach for Sparse Tensor Operations on GPUs , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).
[32] Samuel Williams,et al. Auto-tuning performance on multicore computers , 2008 .
[33] Richard W. Vuduc,et al. Load-Balanced Sparse MTTKRP on GPUs , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[34] Andrzej Cichocki,et al. Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions , 2016, Found. Trends Mach. Learn..
[35] Sriram Krishnamoorthy,et al. An efficient mixed-mode representation of sparse tensors , 2019, SC.
[36] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..
[37] Xu Liu,et al. Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).
[38] Saman P. Amarasinghe,et al. Format abstraction for sparse tensor algebra compilers , 2018, Proc. ACM Program. Lang..
[39] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..
[40] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[41] Samuel Williams,et al. Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis , 2014, PMBS@SC.
[42] Jimeng Sun,et al. Model-Driven Sparse CP Decomposition for Higher-Order Tensors , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[43] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[44] Ivan V. Oseledets,et al. Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.
[45] Ang Li,et al. PASTA: a parallel sparse tensor algorithm benchmark suite , 2019, CCF Transactions on High Performance Computing.
[46] L. Tucker,et al. Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.
[47] Christos Faloutsos,et al. Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..
[48] Richard W. Vuduc,et al. Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures , 2016, 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3).
[49] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[50] Olivier Richard,et al. CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE , 2018 .
[51] Rasmus Bro,et al. The N-way Toolbox for MATLAB , 2000 .
[52] Ninghui Sun,et al. SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication , 2013, PLDI.
[53] Yisong Yue,et al. Long-term Forecasting using Tensor-Train RNNs , 2017, ArXiv.
[54] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[55] Jimeng Sun,et al. An input-adaptive and in-place approach to dense tensor-times-matrix multiply , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[56] Georg Ofenbeck,et al. Applying the roofline model , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[57] Jimeng Sun,et al. Optimizing sparse tensor times matrix on GPUs , 2019, J. Parallel Distributed Comput..