论文信息 - SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq1-3047460.gif"/></alternatives></inline-formula>Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition

SGD$\_$_Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition

Sparse Tucker Decomposition (STD) algorithms learn a core tensor and a group of factor matrices to obtain an optimal low-rank representation feature for the <underline>H</underline>igh-<underline>O</underline>rder, <underline>H</underline>igh-<underline>D</underline>imension, and <underline>S</underline>parse <underline>T</underline>ensor (HOHDST). However, existing STD algorithms face the problem of intermediate variables explosion which results from the fact that the formation of those variables, i.e., matrices Khatri-Rao product, Kronecker product, and matrix-matrix multiplication, follows the whole elements in sparse tensor. The above problems prevent deep fusion of efficient computation and big data platforms. To overcome the bottleneck, a novel stochastic optimization strategy (SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq2-3047460.gif"/></alternatives></inline-formula>Tucker) is proposed for STD which can automatically divide the high-dimension intermediate variables into small batches of intermediate matrices. Specifically, SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq3-3047460.gif"/></alternatives></inline-formula>Tucker only follows the randomly selected small samples rather than the whole elements, while maintaining the overall accuracy and convergence rate. In practice, SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq4-3047460.gif"/></alternatives></inline-formula>Tucker features the two distinct advancements over the state of the art. First, SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq5-3047460.gif"/></alternatives></inline-formula>Tucker can prune the communication overhead for the core tensor in distributed settings. Second, the low data-dependence of SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq6-3047460.gif"/></alternatives></inline-formula>Tucker enables fine-grained parallelization, which makes SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq7-3047460.gif"/></alternatives></inline-formula>Tucker obtaining lower computational overheads with the same accuracy. Experimental results show that SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq8-3047460.gif"/></alternatives></inline-formula>Tucker runs at least 2<inline-formula><tex-math notation="LaTeX">$X$</tex-math><alternatives><mml:math><mml:mi>X</mml:mi></mml:math><inline-graphic xlink:href="li-ieq9-3047460.gif"/></alternatives></inline-formula> faster than the state of the art.

[1] Jiayu Zhou,et al. Spatio-Temporal Multi-Task Learning via Tensor Decomposition , 2021, IEEE Transactions on Knowledge and Data Engineering.

[2] Cong Fang,et al. Accelerated First-Order Optimization Algorithms for Machine Learning , 2020, Proceedings of the IEEE.

[3] MengChu Zhou,et al. Temporal Pattern-Aware QoS Prediction via Biased Non-Negative Latent Factorization of Tensors , 2020, IEEE Transactions on Cybernetics.

[4] Lei Wang,et al. Multiple Kernel $k$k-Means with Incomplete Kernels , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Jintao Li,et al. HO-OTSVD: A Novel Tensor Decomposition and Its Incremental Decomposition for Cyber–Physical–Social Networks (CPSN) , 2020, IEEE Transactions on Network Science and Engineering.

[6] Xiaochun Cao,et al. Tensorized Multi-view Subspace Representation Learning , 2020, International Journal of Computer Vision.

[7] Xien Liu,et al. Tensor Graph Convolutional Networks for Text Classification , 2020, AAAI.

[8] Tim Verbelen,et al. A Survey on Distributed Machine Learning , 2019, ACM Comput. Surv..

[9] Alain Rakotomamonjy,et al. Singleshot : a scalable Tucker tensor decomposition , 2019, NeurIPS.

[10] Lee Sael,et al. High-Performance Tucker Factorization on Heterogeneous Platforms , 2019, IEEE Transactions on Parallel and Distributed Systems.

[11] Aaas News,et al. Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[12] Yang Yang,et al. Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking , 2019, ACM Multimedia.

[13] R. Sarpong,et al. Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[14] Jimeng Sun,et al. Optimizing sparse tensor times matrix on GPUs , 2019, J. Parallel Distributed Comput..

[15] Ruslan Salakhutdinov,et al. Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization , 2019, ACL.

[16] Yogish Sabharwal,et al. On optimizing distributed non-negative Tucker decomposition , 2019, ICS.

[17] Dafang Zhang,et al. Active Sparse Mobile Crowd Sensing Based on Matrix Completion , 2019, SIGMOD Conference.

[18] Junbin Gao,et al. Tensorizing Restricted Boltzmann Machine , 2019, ACM Trans. Knowl. Discov. Data.

[19] Julien Mairal,et al. A Generic Acceleration Framework for Stochastic Composite Optimization , 2019, NeurIPS.

[20] Rong Jin,et al. On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.

[21] Lee Sael,et al. VEST: Very Sparse Tucker Factorization of Large-Scale Tensors , 2019, 2021 IEEE International Conference on Big Data and Smart Computing (BigComp).

[22] Maja Pantic,et al. T-Net: Parametrizing Fully Convolutional Nets With a Single High-Order Tensor , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Timothy M. Hospedales,et al. TuckER: Tensor Factorization for Knowledge Graph Completion , 2019, EMNLP.

[24] Yuxin Chen,et al. Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview , 2018, IEEE Transactions on Signal Processing.

[25] Nikos D. Sidiropoulos,et al. Coupled Graphs and Tensor Factorization for Recommender Systems and Community Detection , 2018, IEEE Transactions on Knowledge and Data Engineering.

[26] Keqin Li,et al. MSGD: A Novel Matrix Factorization Approach for Large-Scale Collaborative Filtering Recommender Systems on GPUs , 2018, IEEE Transactions on Parallel and Distributed Systems.

[27] Yogish Sabharwal,et al. On Optimizing Distributed Tucker Decomposition for Sparse Tensors , 2018, ICS.

[28] Philip S. Yu,et al. Multi-View Multi-Graph Embedding for Brain Network Clustering Analysis , 2018, AAAI.

[29] James Caverlee,et al. DisTenC: A Distributed Algorithm for Scalable Tensor Completion on Spark , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[30] Parijat Dube,et al. Slow and Stale Gradients Can Win the Race , 2018, IEEE Journal on Selected Areas in Information Theory.

[31] Mengjiao Wang,et al. An Adversarial Neuro-Tensorial Approach for Learning Disentangled Representations , 2017, International Journal of Computer Vision.

[32] Laurence T. Yang,et al. A Cloud-Edge Computing Framework for Cyber-Physical-Social Services , 2017, IEEE Communications Magazine.

[33] Lee Sael,et al. Scalable Tucker Factorization for Sparse Tensors - Algorithms and Discoveries , 2017, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[34] George Karypis,et al. Accelerating the Tucker Decomposition with Compressed Sparse Tensors , 2017, Euro-Par.

[35] Masashi Sugiyama,et al. Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives , 2017, Found. Trends Mach. Learn..

[36] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[37] Andrzej Cichocki,et al. Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions , 2016, Found. Trends Mach. Learn..

[38] Bora Uçar,et al. High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[39] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[40] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[41] Yong Luo,et al. Tensor Canonical Correlation Analysis for Multi-View Dimension Reduction , 2015, IEEE Transactions on Knowledge and Data Engineering.

[42] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[43] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[44] Wotao Yin,et al. A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[45] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[46] Syed Zubair,et al. Tensor dictionary learning with sparse TUCKER decomposition , 2013, 2013 18th International Conference on Digital Signal Processing (DSP).

[47] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[48] Jieping Ye,et al. Tensor Completion for Estimating Missing Values in Visual Data , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49] Panagiotis Symeonidis,et al. Tag recommendations based on tensor dimensionality reduction , 2008, RecSys '08.

[50] Yousef Saad,et al. Higher Order Orthogonal Iteration of Tensors (HOOI) and its Relation to PCA and GLRAM , 2007, SDM.

[51] Stephen P. Boyd,et al. Convex Optimization , 2010, IEEE Transactions on Automatic Control.

[52] Nikhil Ketkar,et al. Introduction to PyTorch , 2021, Deep Learning with Python.

[53] W. Hager,et al. and s , 2019, Shallow Water Hydraulics.

[54] Jianhai Zhang,et al. Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling , 2019, NeurIPS.

[55] Stephen Becker,et al. Low-Rank Tucker Decomposition of Large Tensors Using TensorSketch , 2018, NeurIPS.

[56] W. Marsden. I and J , 2012 .

[57] Neil Genzlinger. A. and Q , 2006 .

[58] I. Miyazaki,et al. AND T , 2022 .