SGD$\_$_Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition

Sparse Tucker Decomposition (STD) algorithms learn a core tensor and a group of factor matrices to obtain an optimal low-rank representation feature for the <underline>H</underline>igh-<underline>O</underline>rder, <underline>H</underline>igh-<underline>D</underline>imension, and <underline>S</underline>parse <underline>T</underline>ensor (HOHDST). However, existing STD algorithms face the problem of intermediate variables explosion which results from the fact that the formation of those variables, i.e., matrices Khatri-Rao product, Kronecker product, and matrix-matrix multiplication, follows the whole elements in sparse tensor. The above problems prevent deep fusion of efficient computation and big data platforms. To overcome the bottleneck, a novel stochastic optimization strategy (SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq2-3047460.gif"/></alternatives></inline-formula>Tucker) is proposed for STD which can automatically divide the high-dimension intermediate variables into small batches of intermediate matrices. Specifically, SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq3-3047460.gif"/></alternatives></inline-formula>Tucker only follows the randomly selected small samples rather than the whole elements, while maintaining the overall accuracy and convergence rate. In practice, SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq4-3047460.gif"/></alternatives></inline-formula>Tucker features the two distinct advancements over the state of the art. First, SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq5-3047460.gif"/></alternatives></inline-formula>Tucker can prune the communication overhead for the core tensor in distributed settings. Second, the low data-dependence of SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq6-3047460.gif"/></alternatives></inline-formula>Tucker enables fine-grained parallelization, which makes SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq7-3047460.gif"/></alternatives></inline-formula>Tucker obtaining lower computational overheads with the same accuracy. Experimental results show that SGD<inline-formula><tex-math notation="LaTeX">$\_$</tex-math><alternatives><mml:math><mml:mo>_</mml:mo></mml:math><inline-graphic xlink:href="li-ieq8-3047460.gif"/></alternatives></inline-formula>Tucker runs at least 2<inline-formula><tex-math notation="LaTeX">$X$</tex-math><alternatives><mml:math><mml:mi>X</mml:mi></mml:math><inline-graphic xlink:href="li-ieq9-3047460.gif"/></alternatives></inline-formula> faster than the state of the art.

[1]  Jiayu Zhou,et al.  Spatio-Temporal Multi-Task Learning via Tensor Decomposition , 2021, IEEE Transactions on Knowledge and Data Engineering.

[2]  Cong Fang,et al.  Accelerated First-Order Optimization Algorithms for Machine Learning , 2020, Proceedings of the IEEE.

[3]  MengChu Zhou,et al.  Temporal Pattern-Aware QoS Prediction via Biased Non-Negative Latent Factorization of Tensors , 2020, IEEE Transactions on Cybernetics.

[4]  Lei Wang,et al.  Multiple Kernel $k$k-Means with Incomplete Kernels , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jintao Li,et al.  HO-OTSVD: A Novel Tensor Decomposition and Its Incremental Decomposition for Cyber–Physical–Social Networks (CPSN) , 2020, IEEE Transactions on Network Science and Engineering.

[6]  Xiaochun Cao,et al.  Tensorized Multi-view Subspace Representation Learning , 2020, International Journal of Computer Vision.

[7]  Xien Liu,et al.  Tensor Graph Convolutional Networks for Text Classification , 2020, AAAI.

[8]  Tim Verbelen,et al.  A Survey on Distributed Machine Learning , 2019, ACM Comput. Surv..

[9]  Alain Rakotomamonjy,et al.  Singleshot : a scalable Tucker tensor decomposition , 2019, NeurIPS.

[10]  Lee Sael,et al.  High-Performance Tucker Factorization on Heterogeneous Platforms , 2019, IEEE Transactions on Parallel and Distributed Systems.

[11]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[12]  Yang Yang,et al.  Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking , 2019, ACM Multimedia.

[13]  R. Sarpong,et al.  Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[14]  Jimeng Sun,et al.  Optimizing sparse tensor times matrix on GPUs , 2019, J. Parallel Distributed Comput..

[15]  Ruslan Salakhutdinov,et al.  Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization , 2019, ACL.

[16]  Yogish Sabharwal,et al.  On optimizing distributed non-negative Tucker decomposition , 2019, ICS.

[17]  Dafang Zhang,et al.  Active Sparse Mobile Crowd Sensing Based on Matrix Completion , 2019, SIGMOD Conference.

[18]  Junbin Gao,et al.  Tensorizing Restricted Boltzmann Machine , 2019, ACM Trans. Knowl. Discov. Data.

[19]  Julien Mairal,et al.  A Generic Acceleration Framework for Stochastic Composite Optimization , 2019, NeurIPS.

[20]  Rong Jin,et al.  On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.

[21]  Lee Sael,et al.  VEST: Very Sparse Tucker Factorization of Large-Scale Tensors , 2019, 2021 IEEE International Conference on Big Data and Smart Computing (BigComp).

[22]  Maja Pantic,et al.  T-Net: Parametrizing Fully Convolutional Nets With a Single High-Order Tensor , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Timothy M. Hospedales,et al.  TuckER: Tensor Factorization for Knowledge Graph Completion , 2019, EMNLP.

[24]  Yuxin Chen,et al.  Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview , 2018, IEEE Transactions on Signal Processing.

[25]  Nikos D. Sidiropoulos,et al.  Coupled Graphs and Tensor Factorization for Recommender Systems and Community Detection , 2018, IEEE Transactions on Knowledge and Data Engineering.

[26]  Keqin Li,et al.  MSGD: A Novel Matrix Factorization Approach for Large-Scale Collaborative Filtering Recommender Systems on GPUs , 2018, IEEE Transactions on Parallel and Distributed Systems.

[27]  Yogish Sabharwal,et al.  On Optimizing Distributed Tucker Decomposition for Sparse Tensors , 2018, ICS.

[28]  Philip S. Yu,et al.  Multi-View Multi-Graph Embedding for Brain Network Clustering Analysis , 2018, AAAI.

[29]  James Caverlee,et al.  DisTenC: A Distributed Algorithm for Scalable Tensor Completion on Spark , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[30]  Parijat Dube,et al.  Slow and Stale Gradients Can Win the Race , 2018, IEEE Journal on Selected Areas in Information Theory.

[31]  Mengjiao Wang,et al.  An Adversarial Neuro-Tensorial Approach for Learning Disentangled Representations , 2017, International Journal of Computer Vision.

[32]  Laurence T. Yang,et al.  A Cloud-Edge Computing Framework for Cyber-Physical-Social Services , 2017, IEEE Communications Magazine.

[33]  Lee Sael,et al.  Scalable Tucker Factorization for Sparse Tensors - Algorithms and Discoveries , 2017, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[34]  George Karypis,et al.  Accelerating the Tucker Decomposition with Compressed Sparse Tensors , 2017, Euro-Par.

[35]  Masashi Sugiyama,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives , 2017, Found. Trends Mach. Learn..

[36]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[37]  Andrzej Cichocki,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions , 2016, Found. Trends Mach. Learn..

[38]  Bora Uçar,et al.  High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[39]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[40]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[41]  Yong Luo,et al.  Tensor Canonical Correlation Analysis for Multi-View Dimension Reduction , 2015, IEEE Transactions on Knowledge and Data Engineering.

[42]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[43]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[44]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[45]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[46]  Syed Zubair,et al.  Tensor dictionary learning with sparse TUCKER decomposition , 2013, 2013 18th International Conference on Digital Signal Processing (DSP).

[47]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[48]  Jieping Ye,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Panagiotis Symeonidis,et al.  Tag recommendations based on tensor dimensionality reduction , 2008, RecSys '08.

[50]  Yousef Saad,et al.  Higher Order Orthogonal Iteration of Tensors (HOOI) and its Relation to PCA and GLRAM , 2007, SDM.

[51]  Stephen P. Boyd,et al.  Convex Optimization , 2010, IEEE Transactions on Automatic Control.

[52]  Nikhil Ketkar,et al.  Introduction to PyTorch , 2021, Deep Learning with Python.

[53]  W. Hager,et al.  and s , 2019, Shallow Water Hydraulics.

[54]  Jianhai Zhang,et al.  Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling , 2019, NeurIPS.

[55]  Stephen Becker,et al.  Low-Rank Tucker Decomposition of Large Tensors Using TensorSketch , 2018, NeurIPS.

[56]  W. Marsden I and J , 2012 .

[57]  Neil Genzlinger A. and Q , 2006 .

[58]  I. Miyazaki,et al.  AND T , 2022 .