暂无分享,去创建一个
[1] Paolo Bientinesi,et al. HPTT: a high-performance tensor transposition C++ library , 2017, ARRAY@PLDI.
[2] Lars Karlsson,et al. Parallel algorithms for tensor completion in the CP format , 2016, Parallel Comput..
[3] F. L. Hitchcock. The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .
[4] Sriram Krishnamoorthy,et al. Toward generalized tensor algebra for ab initio quantum chemistry methods , 2019, ARRAY@PLDI.
[5] John F. Canny,et al. Big data analytics with small footprint: squaring the cloud , 2013, KDD.
[6] John R. Gilbert,et al. Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments , 2011, SIAM J. Sci. Comput..
[7] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[8] Prateek Jain,et al. Low-rank matrix completion using alternating minimization , 2012, STOC '13.
[9] David B. Skillicorn,et al. Questions and Answers about BSP , 1997, Sci. Program..
[10] Torsten Hoefler,et al. Scaling Betweenness Centrality using Communication-Efficient Sparse Matrix Multiplication , 2016, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] James Demmel,et al. Communication optimal parallel multiplication of sparse random matrices , 2013, SPAA.
[12] Peter Ahrens,et al. Tensor Algebra Compilation with Workspaces , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[13] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..
[14] Rasmus Pagh,et al. The Input/Output Complexity of Sparse Matrix Multiplication , 2014, ESA.
[15] Trevor J. Hastie,et al. Matrix completion and low-rank SVD via fast alternating least squares , 2014, J. Mach. Learn. Res..
[16] S. Hirata. Tensor Contraction Engine: Abstraction and Automated Parallel Implementation of Configuration-Interaction, Coupled-Cluster, and Many-Body Perturbation Theories , 2003 .
[17] Leonid Oliker,et al. Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[18] John F. Stanton,et al. A massively parallel tensor contraction framework for coupled-cluster computations , 2014, J. Parallel Distributed Comput..
[19] Saman P. Amarasinghe,et al. Format abstraction for sparse tensor algebra compilers , 2018, Proc. ACM Program. Lang..
[20] Fred G. Gustavson,et al. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition , 1978, TOMS.
[21] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[22] James Bennett,et al. The Netflix Prize , 2007 .
[23] Andrea Montanari,et al. Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..
[24] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..
[25] Christina Freytag,et al. Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .
[26] Andrzej Cichocki,et al. Fast Alternating LS Algorithms for High Order CANDECOMP/PARAFAC Tensor Factorizations , 2013, IEEE Transactions on Signal Processing.
[27] George Karypis,et al. An Exploration of Optimization Algorithms for High Performance Tensor Completion , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[28] Nikos D. Sidiropoulos,et al. SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[29] Jimeng Sun,et al. Model-Driven Sparse CP Decomposition for Higher-Order Tensors , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[30] Oded Schwartz,et al. Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication , 2015, SPAA.
[31] Torsten Hoefler,et al. Sparse Tensor Algebra as a Parallel Programming Model , 2015, ArXiv.
[32] Jason Duell,et al. Productivity and performance using partitioned global address space languages , 2007, PASCO '07.
[33] Raf Vandebril,et al. Computing the Gradient in Optimization Algorithms for the CP Decomposition in Constant Memory through Tensor Blocking , 2015, SIAM J. Sci. Comput..
[34] Jimeng Sun,et al. HiCOO: Hierarchical Storage of Sparse Tensors , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[35] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.
[36] Jack Dongarra,et al. ScaLAPACK user's guide , 1997 .
[37] Daniel Kats,et al. Sparse tensor framework for implementation of general local correlation methods. , 2013, The Journal of chemical physics.
[38] Bora Uçar,et al. Scalable sparse tensor decompositions in distributed memory systems , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[39] Justus A. Calvin,et al. Scalable task-based algorithm for multiplication of block-rank-sparse matrices , 2015, IA3@SC.
[40] Gaël Varoquaux,et al. The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.
[41] Inderjit S. Dhillon,et al. Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.
[42] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[43] Eric Jones,et al. SciPy: Open Source Scientific Tools for Python , 2001 .
[44] Robert J. Harrison,et al. Global arrays: A nonuniform memory access programming model for high-performance computers , 1996, The Journal of Supercomputing.
[45] Bora Uçar,et al. Parallel Candecomp/Parafac Decomposition of Sparse Tensors Using Dimension Trees , 2018, SIAM J. Sci. Comput..
[46] P. Sadayappan,et al. Sampled Dense Matrix Multiplication for High-Performance Machine Learning , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).
[47] George Karypis,et al. Tensor-matrix products with a compressed sparse tensor , 2015, IA3@SC.
[48] Grey Ballard,et al. Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[49] Evgeny Epifanovsky,et al. New implementation of high‐level correlated methods using a general block tensor library for high‐performance electronic structure calculations , 2013, J. Comput. Chem..
[50] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[51] Ang Li,et al. PASTA: a parallel sparse tensor algorithm benchmark suite , 2019, CCF Transactions on High Performance Computing.
[52] Jieping Ye,et al. Tensor Completion for Estimating Missing Values in Visual Data , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[53] Stefan Behnel,et al. Cython: The Best of Both Worlds , 2011, Computing in Science & Engineering.
[54] John R. Gilbert,et al. On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[55] Peter J. Haas,et al. Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.
[56] Rainer Gemulla,et al. Distributed Matrix Completion , 2012, 2012 IEEE 12th International Conference on Data Mining.
[57] Justus A. Calvin,et al. Massively Parallel Implementation of Explicitly Correlated Coupled-Cluster Singles and Doubles Using TiledArray Framework. , 2016, The journal of physical chemistry. A.
[58] Grey Ballard,et al. Shared-memory parallelization of MTTKRP for dense tensors , 2018, PPOPP.
[59] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.