ALTO: adaptive linearized storage of sparse tensors
暂无分享,去创建一个
Fabio Checconi | Fabrizio Petrini | Teresa M. Ranadive | Jan Laukemann | Jesmin Jahan Tithi | Ahmed E. Helal | Teresa Ranadive | Jeewhan Choi | F. Petrini | Jan Laukemann | Fabio Checconi | Ahmed Helal | Jeewhan Choi
[1] Marcin Paprzycki,et al. On BLAS Operations with Recursively Stored Sparse Matrices , 2010, 2010 12th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.
[2] Sparsh Mittal. A survey of techniques for designing and managing CPU register file , 2017, Concurr. Comput. Pract. Exp..
[3] Yue Zhao,et al. Bridging the gap between deep learning and sparse matrix format selection , 2018, PPoPP.
[4] Adam P. Harrison,et al. High Performance Rearrangement and Multiplication Routines for Sparse Tensor Arithmetic , 2018, SIAM J. Sci. Comput..
[5] Paolo Bientinesi,et al. Recursive Algorithms for Dense Linear Algebra: The ReLAPACK Collection , 2016 .
[6] Emilio Ferrara,et al. Extracting the multi-timescale activity patterns of online financial markets , 2018, Scientific Reports.
[7] Joost VandeVondele,et al. Sparse matrix multiplication: The distributed block-compressed sparse row library , 2014, Parallel Comput..
[8] Anand D. Sarwate,et al. A Unified Optimization Approach for Sparse Tensor Operations on GPUs , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).
[9] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..
[10] George Karypis,et al. Tensor-matrix products with a compressed sparse tensor , 2015, IA3@SC.
[11] Xing Liu,et al. Blocking Optimization Techniques for Sparse Tensor Computation , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[12] Jan Reineke,et al. uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures , 2018, ASPLOS.
[13] Nico Vervliet,et al. Tensorlab 3.0 — Numerical optimization strategies for large-scale constrained and coupled matrix/tensor factorization , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.
[14] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[15] Weifeng Liu,et al. Parallel Transposition of Sparse Data Structures , 2016, ICS.
[16] Tamara G. Kolda,et al. Efficient MATLAB Computations with Sparse and Factored Tensors , 2007, SIAM J. Sci. Comput..
[17] J. H. Choi,et al. DFacTo: Distributed Factorization of Tensors , 2014, NIPS.
[18] Sriram Krishnamoorthy,et al. An efficient mixed-mode representation of sparse tensors , 2019, SC.
[19] Shuangzhe Liu,et al. Hadamard, Khatri-Rao, Kronecker and Other Matrix Products , 2008 .
[20] Gerhard Wellein,et al. Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels , 2019, 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).
[21] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..
[22] Jiajia Li,et al. Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory , 2021, PPoPP.
[23] Onur Mutlu,et al. Demystifying Complex Workload-DRAM Interactions: An Experimental Study , 2019, SIGMETRICS.
[24] Nikos D. Sidiropoulos,et al. Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.
[25] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[26] John R. Gilbert,et al. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.
[27] Christos Faloutsos,et al. HaTen2: Billion-scale tensor decompositions , 2015, 2015 IEEE 31st International Conference on Data Engineering.
[28] Nikos D. Sidiropoulos,et al. SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[29] Jimeng Sun,et al. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.
[30] Saman P. Amarasinghe,et al. Format abstraction for sparse tensor algebra compilers , 2018, Proc. ACM Program. Lang..
[31] Rob H. Bisseling,et al. Two-dimensional cache-oblivious sparse matrix-vector multiplication , 2011, Parallel Comput..
[32] T. van Amelsvoort. Bridging the Gap , 2014, Tijdschrift voor psychiatrie.
[33] Erik Elmroth,et al. SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .
[34] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[35] Patrick Flick,et al. High Performance Streaming Tensor Decomposition , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[36] Richard W. Vuduc,et al. Load-Balanced Sparse MTTKRP on GPUs , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[37] Fred G. Gustavson,et al. Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..
[38] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[39] Peter Ahrens,et al. Sparse Tensor Transpositions , 2020, SPAA.
[40] Christos Faloutsos,et al. GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.
[41] Michele Martone,et al. Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format , 2014, Parallel Comput..
[42] Nikos D. Sidiropoulos,et al. Streaming Tensor Factorization for Infinite Data Sources , 2018, SDM.
[43] Hadi Fanaee-T,et al. Tensor-based anomaly detection: An interdisciplinary survey , 2016, Knowl. Based Syst..
[44] Tamara G. Kolda,et al. Scalable Tensor Decompositions for Multi-aspect Data Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.
[45] Nikos D. Sidiropoulos,et al. Tensors for Data Mining and Data Fusion , 2016, ACM Trans. Intell. Syst. Technol..
[46] Jimeng Sun,et al. Efficient and effective sparse tensor reordering , 2019, ICS.
[47] George Karypis,et al. Accelerating the Tucker Decomposition with Compressed Sparse Tensors , 2017, Euro-Par.
[48] Alfio Lazzaro,et al. DBCSR: A Blocked Sparse Tensor Algebra Library , 2019, PARCO.
[49] L. Gan,et al. SpTFS: Sparse Tensor Format Selection for MTTKRP via Deep Learning , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[50] John F. Stanton,et al. A massively parallel tensor contraction framework for coupled-cluster computations , 2014, J. Parallel Distributed Comput..
[51] Mithuna Thottethodi,et al. Recursive Array Layouts and Fast Matrix Multiplication , 2002, IEEE Trans. Parallel Distributed Syst..
[52] Benoît Meister,et al. Efficient and scalable computations with sparse tensors , 2012, 2012 IEEE Conference on High Performance Extreme Computing.
[53] Zhen Xie,et al. IA-SpGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication , 2019, ICS.
[54] G. Peano. Sur une courbe, qui remplit toute une aire plane , 1890 .
[55] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[56] Rob H. Bisseling,et al. Cache-Oblivious Sparse Matrix--Vector Multiplication by Using Sparse Matrix Partitioning Methods , 2009, SIAM J. Sci. Comput..
[57] Jimeng Sun,et al. HiCOO: Hierarchical Storage of Sparse Tensors , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[58] Jimeng Sun,et al. Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics , 2015, KDD.
[59] Aart J. C. Bik,et al. Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.
[60] Tamara G. Kolda,et al. Software for Sparse Tensor Decomposition on Emerging Computing Architectures , 2018, SIAM J. Sci. Comput..