swTensor: accelerating tensor decomposition on Sunway architecture
暂无分享,去创建一个
Guangwen Yang | Hailong Yang | Depei Qian | Zhongzhi Luan | Xiaogang Zhong | Lin Gan | L. Gan | Guangwen Yang | D. Qian | Hailong Yang | Zhongzhi Luan | Xiaogang Zhong
[1] Lee Sael,et al. High-Performance Tucker Factorization on Heterogeneous Platforms , 2019, IEEE Transactions on Parallel and Distributed Systems.
[2] Andrzej Cichocki,et al. Tensor Decompositions for Signal Processing Applications: From two-way to multiway component analysis , 2014, IEEE Signal Processing Magazine.
[3] J. H. Choi,et al. DFacTo: Distributed Factorization of Tensors , 2014, NIPS.
[4] Tamara G. Kolda,et al. Software for Sparse Tensor Decomposition on Emerging Computing Architectures , 2018, SIAM J. Sci. Comput..
[5] Kathryn A. Dowsland,et al. Simulated Annealing , 1989, Encyclopedia of GIS.
[6] Depei Qian,et al. Accelerating tile low-rank GEMM on sunway architecture: POSTER , 2019, CF.
[7] Christos Faloutsos,et al. FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop , 2014, SDM.
[8] Christos Faloutsos,et al. HaTen2: Billion-scale tensor decompositions , 2015, 2015 IEEE 31st International Conference on Data Engineering.
[9] Xing Liu,et al. Blocking Optimization Techniques for Sparse Tensor Computation , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[10] Christos Faloutsos,et al. Mining billion-scale tensors: algorithms and discoveries , 2016, The VLDB Journal.
[11] Gene H. Golub,et al. Matrix computations , 1983 .
[12] Hans-Peter Kriegel,et al. Factorizing YAGO: scalable machine learning for linked data , 2012, WWW.
[13] Nikos D. Sidiropoulos,et al. Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.
[14] Richard W. Vuduc,et al. Load-Balanced Sparse MTTKRP on GPUs , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[15] George Karypis,et al. Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[16] Meng Zhang,et al. Redesigning LAMMPS for Peta-Scale and Hundred-Billion-Atom Simulation on Sunway TaihuLight , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[17] Kai-Wei Chang,et al. Typed Tensor Decomposition of Knowledge Bases for Relation Extraction , 2014, EMNLP.
[18] Weifeng Liu,et al. swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures , 2018, PPoPP.
[19] Xin Liu,et al. Towards Efficient SpMV on Sunway Manycore Architectures , 2018, ICS.
[20] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[21] Tat-Seng Chua,et al. Fast Matrix Factorization for Online Recommendation with Implicit Feedback , 2016, SIGIR.
[22] Milan Sonka,et al. Image Processing, Analysis and Machine Vision , 1993, Springer US.
[23] Wei Zhang,et al. Simulating the Wenchuan Earthquake with Accurate Surface Topography on Sunway TaihuLight , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[24] Tamir Hazan,et al. Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.
[25] Anand D. Sarwate,et al. A Unified Optimization Approach for Sparse Tensor Operations on GPUs , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).
[26] Xuelong Li,et al. Tensors in Image Processing and Computer Vision , 2009, Advances in Pattern Recognition.
[27] Pierre Comon,et al. Multiarray Signal Processing: Tensor decomposition meets compressed sensing , 2010, ArXiv.
[28] Maryam Mehri Dehnavi,et al. CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms , 2018, ICPP.
[29] Parker Allen Tew,et al. An investigation of sparse tensor formats for tensor libraries , 2016 .
[30] Guangwen Yang,et al. Large-Scale Hierarchical k-means for Heterogeneous Many-Core Supercomputers , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[31] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..
[32] James Lin,et al. Benchmarking SW26010 Many-Core Processor , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[33] Depei Qian,et al. Multi-role SpTRSV on Sunway Many-Core Architecture , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[34] Christos Faloutsos,et al. GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.
[35] Jungwoo Lee,et al. BIGtensor: Mining Billion-Scale Tensor Made Easy , 2016, CIKM.
[36] Jimeng Sun,et al. Optimizing sparse tensor times matrix on GPUs , 2019, J. Parallel Distributed Comput..
[37] Depei Qian,et al. swMR: A Framework for Accelerating MapReduce Applications on Sunway Taihulight , 2018 .
[38] Guangwen Yang,et al. Massively Scaling Seismic Processing on Sunway TaihuLight Supercomputer , 2020, IEEE Transactions on Parallel and Distributed Systems.
[39] Cheng Lei,et al. Tri-focal tensor-based multiple video synchronization with subframe optimization , 2006, IEEE Transactions on Image Processing.
[40] F. Maxwell Harper,et al. The MovieLens Datasets: History and Context , 2016, TIIS.
[41] F. L. Hitchcock. The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .
[42] Richard W. Vuduc,et al. Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures , 2016, 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3).
[43] Tamara G. Kolda,et al. Scalable Tensor Factorizations for Incomplete Data , 2010, ArXiv.
[44] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.