cuTensor-Tubal: Efficient Primitives for Tubal-Rank Tensor Learning Operations on GPUs
暂无分享,去创建一个
Xiaodong Wang | Tao Zhang | Xiao-Yang Liu | Anwar Walid | Xiaodong Wang | Xiao-Yang Liu | A. Walid | Tao Zhang
[1] Feng Qian,et al. Tensor Super-resolution for Seismic Data , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Prasanna Balaprakash,et al. Generating Efficient Tensor Contractions for GPUs , 2015, 2015 44th International Conference on Parallel Processing.
[3] Dhabaleswar K. Panda,et al. Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast , 2019, IEEE Transactions on Parallel and Distributed Systems.
[4] Lieven Eeckhout,et al. HeteroCore GPU to Exploit TLP-Resource Diversity , 2019, IEEE Transactions on Parallel and Distributed Systems.
[5] Eric L. Miller,et al. Tensor-Based Formulation and Nuclear Norm Regularization for Multienergy Computed Tomography , 2013, IEEE Transactions on Image Processing.
[6] Adam Zalcman,et al. TensorNetwork: A Library for Physics and Machine Learning , 2019, ArXiv.
[7] Tao Deng,et al. Tensor Sensing for Rf Tomographic Imaging , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).
[8] D. Rubin,et al. Statistical Analysis with Missing Data , 1988 .
[9] Gerik Scheuermann,et al. Fast and Memory Efficient GPU-Based Rendering of Tensor Data , 2011 .
[10] Rafael Ballester-Ripoll,et al. Multiresolution Volume Filtering in the Tensor Compressed Domain , 2018, IEEE Transactions on Visualization and Computer Graphics.
[11] Jack J. Dongarra,et al. Performance, Design, and Autotuning of Batched GEMM for GPUs , 2016, ISC.
[12] Tao Zhang,et al. High-Performance Homomorphic Matrix Completion on GPUs , 2019, 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[13] Xiaodong Wang,et al. Adaptive Sampling of RF Fingerprints for Fine-Grained Indoor Localization , 2015, IEEE Transactions on Mobile Computing.
[14] Johan A. K. Suykens,et al. Learning with tensors: a framework based on convex optimization and spectral regularization , 2014, Machine Learning.
[15] Xiaodong Wang,et al. Low-Tubal-Rank Tensor Completion Using Alternating Minimization , 2016, IEEE Transactions on Information Theory.
[16] Misha Elena Kilmer,et al. Novel Methods for Multilinear Data Completion and De-noising Based on Tensor-SVD , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[17] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..
[18] Misha Elena Kilmer,et al. Third-Order Tensors as Operators on Matrices: A Theoretical and Computational Framework with Applications in Imaging , 2013, SIAM J. Matrix Anal. Appl..
[19] U. N. Niranjan,et al. Tensor Contractions with Extended BLAS Kernels on CPU and GPU , 2016, HiPC 2016.
[20] Thomas B. Rolinger,et al. Performance challenges for heterogeneous distributed tensor decompositions , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).
[21] J. H. Choi,et al. DFacTo: Distributed Factorization of Tensors , 2014, NIPS.
[22] M. Kilmer,et al. Factorization strategies for third-order tensors , 2011 .
[23] Lin-Ching Chang,et al. GPU acceleration of nonlinear diffusion tensor estimation using CUDA and MPI , 2014, Neurocomputing.
[24] Xiaodong Wang,et al. LS-Decomposition for Robust Recovery of Sensory Big Data , 2018, IEEE Transactions on Big Data.
[25] Kenli Li,et al. CUSNTF: A Scalable Sparse Non-negative Tensor Factorization Model for Large-scale Industrial Applications on Multi-GPU , 2018, CIKM.
[26] Bora Uçar,et al. High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors , 2016, 2016 45th International Conference on Parallel Processing (ICPP).
[27] David E. Keyes,et al. Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression , 2017, Parallel Comput..
[28] Bingsheng He,et al. Scalable GPU Virtualization with Dynamic Sharing of Graphics Memory Space , 2018, IEEE Transactions on Parallel and Distributed Systems.
[29] Ivan Oseledets,et al. Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..
[30] Christos Faloutsos,et al. GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.
[31] Ying-Jer Kao,et al. GPU accelerated tensor contractions in the plaquette renormalization scheme , 2011 .
[32] Zheng Shou,et al. Deep Tensor ADMM-Net for Snapshot Compressive Imaging , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[33] Dmitry I. Lyakh. An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU , 2015, Comput. Phys. Commun..
[34] Tao Zhang,et al. Cutensor-tubal: Optimized GPU Library for Low-tubal-rank Tensors , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Hong Chen,et al. GPUTENSOR: Efficient tensor factorization for context-aware recommendations , 2015, Inf. Sci..
[36] Markku Hauta-Kasari,et al. Nonnegative Tensor Factorization Accelerated Using GPGPU , 2011, IEEE Transactions on Parallel and Distributed Systems.
[37] Tao Zhang,et al. High-Performance Tensor Decoder on GPUs for Wireless Camera Networks in IoT , 2019, 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[38] Nikos D. Sidiropoulos,et al. Tensors for Data Mining and Data Fusion , 2016, ACM Trans. Intell. Syst. Technol..
[39] Andrzej Cichocki,et al. Tensor Decompositions for Signal Processing Applications: From two-way to multiway component analysis , 2014, IEEE Signal Processing Magazine.
[40] Dmitry I. Lyakh,et al. cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs , 2017, ArXiv.
[41] Hongtao Lu,et al. Efficient Multi-Dimensional Tensor Sparse Coding Using t-Linear Combination , 2018, AAAI.
[42] David A. Patterson,et al. A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution , 2018, IEEE Micro.
[43] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[44] Athanasios V. Vasilakos,et al. CDC: Compressive Data Collection for Wireless Sensor Networks , 2015, IEEE Transactions on Parallel and Distributed Systems.