A Unified Optimization Approach for Sparse Tensor Operations on GPUs
暂无分享,去创建一个
[1] Bora Uçar,et al. Scalable sparse tensor decompositions in distributed memory systems , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] B. Khoromskij,et al. Tensor numerical methods in quantum chemistry: from Hartree-Fock to excitation energies. , 2015, Physical chemistry chemical physics : PCCP.
[3] Demetri Terzopoulos,et al. Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.
[4] Tom Michael Mitchell,et al. Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.
[5] Estevam R. Hruschka,et al. Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.
[6] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[7] Christos Faloutsos,et al. HaTen2: Billion-scale tensor decompositions , 2015, 2015 IEEE 31st International Conference on Data Engineering.
[8] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..
[9] James Demmel,et al. Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[10] Rasmus Bro,et al. The N-way Toolbox for MATLAB , 2000 .
[11] Christos Faloutsos,et al. GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.
[12] George Karypis,et al. Tensor-matrix products with a compressed sparse tensor , 2015, IA3@SC.
[13] Tamara G. Kolda,et al. Parallel Tensor Compression for Large-Scale Scientific Data , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[14] Anima Anandkumar,et al. Online tensor methods for learning latent variable models , 2013, J. Mach. Learn. Res..
[15] Shengen Yan,et al. StreamScan: fast scan algorithms for GPUs without global barrier synchronization , 2013, PPoPP '13.
[16] Shubhabrata Sengupta,et al. Efficient Parallel Scan Algorithms for GPUs , 2011 .
[17] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..
[18] Richard W. Vuduc,et al. Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures , 2016, 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3).
[19] Benoît Meister,et al. Efficient and scalable computations with sparse tensors , 2012, 2012 IEEE Conference on High Performance Extreme Computing.
[20] Bora Uçar,et al. High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors , 2016, 2016 45th International Conference on Parallel Processing (ICPP).
[21] Nikos D. Sidiropoulos,et al. SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[22] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[23] Martha Larson,et al. TFMAP: optimizing MAP for top-n context-aware recommendation , 2012, SIGIR '12.
[24] Steffen Staab,et al. PINTS: peer-to-peer infrastructure for tagging systems , 2008, IPTPS.
[25] M. Alex O. Vasilescu. Multilinear projection for face recognition via canonical decomposition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.
[26] J. H. Choi,et al. DFacTo: Distributed Factorization of Tensors , 2014, NIPS.
[27] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[28] Tamara G. Kolda,et al. Scalable Tensor Decompositions for Multi-aspect Data Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.