Adaptive Sketching for Fast and Convergent Canonical Polyadic Decomposition

This work considers the canonical polyadic decomposition (CPD) of tensors using proximally regularized sketched alternating least squares algorithms. First, it establishes a sublinear rate of convergence for proximally regularized sketched CPD algorithms under two natural conditions that are known to be satisfied by many popular forms of sketching. Second, it demonstrates that the iterative nature of CPD algorithms can be exploited algorithmically to choose more performant sketching rates. This is accomplished by introducing CPD-MWU, a proximally-regularized sketched alternating least squares algorithm that adaptively selects the sketching rate at each iteration. On both synthetic and real data we observe that for noisy tensors CPD-MWU produces decompositions of comparable accuracy to the standard CPD decomposition in less time, often half the time; for ill-conditioned tensors, given the same time budget, CPD-MWU produces decompositions with an order-of-magnitude lower relative error. For a representative real-world dataset CPD-MWU produces residual errors on average 20% lower than CPRAND-MIX and 44% lower than SPALS, two recent sketched CPD algorithms.

[1]  Andrzej Cichocki,et al.  Tensor Decompositions for Signal Processing Applications: From two-way to multiway component analysis , 2014, IEEE Signal Processing Magazine.

[2]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[3]  David P. Woodruff,et al.  Relative Error Tensor Low Rank Approximation , 2017, Electron. Colloquium Comput. Complex..

[4]  Nikos D. Sidiropoulos,et al.  ParCube: Sparse Parallelizable Tensor Decompositions , 2012, ECML/PKDD.

[5]  R. Bro,et al.  PARAFAC2—Part I. A direct fitting algorithm for the PARAFAC2 model , 1999 .

[6]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[7]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[8]  Alexander J. Smola,et al.  Fast and Guaranteed Tensor Decomposition via Sketching , 2015, NIPS.

[9]  Soon Ki Jung,et al.  Online Stochastic Tensor Decomposition for Background Subtraction in Multispectral Video Sequences , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[10]  Shusen Wang,et al.  Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging , 2017, ICML.

[11]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[12]  Nikos D. Sidiropoulos,et al.  A parallel algorithm for big tensor decomposition using randomly compressed cubes (PARACOMP) , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Sujay Sanghavi,et al.  A New Sampling Technique for Tensors , 2015, ArXiv.

[14]  Yan Liu,et al.  SPALS: Fast Alternating Least Squares via Implicit Leverage Scores Sampling , 2016, NIPS.

[15]  Bo Yang,et al.  ParaSketch: Parallel Tensor Factorization via Sketching , 2018, SDM.

[16]  Hadi Fanaee-T,et al.  Tensor-based anomaly detection: An interdisciplinary survey , 2016, Knowl. Based Syst..

[17]  Trac D. Tran,et al.  Tensor sparsification via a bound on the spectral norm of random tensors , 2010, ArXiv.

[18]  David P. Woodruff,et al.  Sublinear Time Orthogonal Tensor Decomposition , 2016, NIPS.

[19]  Bülent Yener,et al.  Accelerating a Distributed CPD Algorithm for Large Dense, Skewed Tensors , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[20]  Gregory Beylkin,et al.  Randomized Alternating Least Squares for Canonical Tensor Decompositions: Application to A PDE With Random Data , 2015, SIAM J. Sci. Comput..

[21]  Kejun Huang,et al.  Block-randomized Stochastic Proximal Gradient for Constrained Low-rank Tensor Factorization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[23]  Daniel M. Dunlavy,et al.  A scalable optimization approach for fitting canonical tensor decompositions , 2011 .

[24]  Tamara G. Kolda,et al.  A Practical Randomized CP Tensor Decomposition , 2017, SIAM J. Matrix Anal. Appl..

[25]  Bülent Yener,et al.  Adapting to data sparsity for efficient parallel PARAFAC tensor decomposition in Hadoop , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[26]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[27]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[28]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[29]  Nico Vervliet,et al.  A Randomized Block Sampling Approach to Canonical Polyadic Decomposition of Large-Scale Tensors , 2016, IEEE Journal of Selected Topics in Signal Processing.

[30]  Rasmus Bro,et al.  A comparison of algorithms for fitting the PARAFAC model , 2006, Comput. Stat. Data Anal..

[31]  Na Li,et al.  Some Convergence Results on the Regularized Alternating Least-Squares Method for Tensor Decomposition , 2011, 1109.3831.

[32]  Evangelos E. Papalexakis,et al.  SamBaTen: Sampling-based Batch Incremental Tensor Decomposition , 2017, SDM.

[33]  Nikos D. Sidiropoulos,et al.  Tensors for Data Mining and Data Fusion , 2016, ACM Trans. Intell. Syst. Technol..

[34]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[35]  Charalampos E. Tsourakakis MACH: Fast Randomized Tensor Decompositions , 2009, SDM.