Turbo‐SMT: Parallel coupled sparse matrix‐Tensor factorizations and applications

How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like 'edible', 'fits in hand')? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we enhance any CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, produces sparse and interpretable solutions, and parallelizes any CMTF algorithm, producing sparse and interpretable solutions (up to 65 fold). Additionally, we improve upon ALS, the work-horse algorithm for CMTF, with respect to efficiency and robustness to missing values. We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. Finally, we demonstrate the generality of Turbo-SMT, by applying it on a Facebook dataset (users, 'friends', wall-postings); there, Turbo-SMT spots spammer-like anomalies.

[1]  Rasmus Bro,et al.  MULTI-WAY ANALYSIS IN THE FOOD INDUSTRY Models, Algorithms & Applications , 1998 .

[2]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[3]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[4]  Ching-hsiang Hung,et al.  The Moore-Penrose inverse of a partitioned matrix ? , 1975 .

[5]  Shuangzhe Liu,et al.  Hadamard, Khatri-Rao, Kronecker and Other Matrix Products , 2008 .

[6]  K. Selçuk Candan,et al.  Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient tensor decomposition , 2012, CIKM.

[7]  G. Giannakis,et al.  A FAST LEAST SQUARES ALGORITHM FOR SEPARATING TRILINEAR MIXTURES , 2004 .

[8]  Tamara G. Kolda,et al.  Scalable Tensor Factorizations for Incomplete Data , 2010, ArXiv.

[9]  Brett W. Bader,et al.  The TOPHITS Model for Higher-Order Web Link Analysis∗ , 2006 .

[10]  Rasmus Bro,et al.  Improving the speed of multiway algorithms: Part II: Compression , 1998 .

[11]  Rasmus Bro,et al.  Understanding data fusion within the framework of coupled matrix and tensor factorizations , 2013 .

[12]  Tamara G. Kolda,et al.  All-at-once Optimization for Coupled Matrix and Tensor Factorizations , 2011, ArXiv.

[13]  Nikos D. Sidiropoulos,et al.  Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x , 2014, SDM.

[14]  R B Paris Hadamard 코드를 이용한 음성인식 무선덤웨이터의 구현 , 2011 .

[15]  Lars Kai Hansen,et al.  Shift-invariant multilinear decomposition of neuroimaging data , 2008, NeuroImage.

[16]  Tom M. Mitchell,et al.  Selecting Corpus-Semantic Models for Neurolinguistic Decoding , 2012, *SEMEVAL.

[17]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[18]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[19]  Tom Michael Mitchell,et al.  Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.

[20]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[21]  Petros Drineas,et al.  Tensor-CUR Decompositions for Tensor-Based Data , 2008, SIAM J. Matrix Anal. Appl..

[22]  Jimeng Sun,et al.  MetaFac: community discovery via relational hypergraph factorization , 2009, KDD.

[23]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[24]  Nikos D. Sidiropoulos,et al.  Co-clustering as multilinear decomposition with sparse latent factors , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  J. Kruskal,et al.  Candelinc: A general approach to multidimensional analysis of many-way arrays with linear constraints on parameters , 1980 .

[26]  Qiang Zhang,et al.  A Parallel Nonnegative Tensor Factorization Algorithm for Mining Global Climate Data , 2009, ICCS.

[27]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[28]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition , 2006, SIAM J. Comput..

[29]  Age K. Smilde,et al.  Multiway multiblock component and covariates regression models , 2000 .

[30]  André Lima Férrer de Almeida,et al.  Distributed large-scale tensor decomposition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Xing Xie,et al.  Collaborative Filtering Meets Mobile Recommendation: A User-Centered Approach , 2010, AAAI.

[32]  R. Bro,et al.  PARAFAC and missing values , 2005 .

[33]  Michael W. Mahoney,et al.  A randomized algorithm for a tensor-based generalization of the singular value decomposition , 2007 .

[34]  Rasmus Bro,et al.  Coupled Matrix Factorization with Sparse Factors to Identify Potential Biomarkers in Metabolomics , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[35]  Nikos D. Sidiropoulos,et al.  ParCube: Sparse Parallelizable Tensor Decompositions , 2012, ECML/PKDD.

[36]  Tom F. Wilderjans,et al.  Computational Statistics and Data Analysis Simultaneous Analysis of Coupled Data Blocks Differing in Size: a Comparison of Two Weighting Schemes , 2022 .

[37]  Tamara G. Kolda,et al.  MATLAB Tensor Toolbox , 2006 .

[38]  Rasmus Bro,et al.  A comparison of algorithms for fitting the PARAFAC model , 2006, Comput. Stat. Data Anal..

[39]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[40]  Christos Faloutsos,et al.  FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop , 2014, SDM.

[41]  Yukihiko Yamashita,et al.  Linked PARAFAC/CP Tensor Decomposition and Its Fast Implementation for Multi-block Tensor Analysis , 2012, ICONIP.

[42]  Ian Davidson,et al.  Network discovery via constrained tensor analysis of fMRI data , 2013, KDD.

[43]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[44]  Nikos D. Sidiropoulos,et al.  Scoup-SMT: Scalable Coupled Sparse Matrix-Tensor Factorization , 2013, ArXiv.

[45]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[46]  James Bailey,et al.  Mining Labelled Tensors by Discovering both their Common and Discriminative Subspaces , 2013, SDM.

[47]  Nikos D. Sidiropoulos,et al.  From K-Means to Higher-Way Co-Clustering: Multilinear Decomposition With Sparse Latent Factors , 2013, IEEE Transactions on Signal Processing.

[48]  Tamara G. Kolda,et al.  Scalable Tensor Decompositions for Multi-aspect Data Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[49]  Nikos D. Sidiropoulos,et al.  A parallel algorithm for big tensor decomposition using randomly compressed cubes (PARACOMP) , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[51]  H. Kiers Towards a standardized notation and terminology in multiway analysis , 2000 .

[52]  Bülent Yener,et al.  Coupled Analysis of In Vitro and Histology Tissue Samples to Quantify Structure-Function Relationship , 2012, PloS one.

[53]  Arindam Banerjee,et al.  Multi-way Clustering on Relation Graphs , 2007, SDM.

[54]  Tamara G. Kolda,et al.  Efficient MATLAB Computations with Sparse and Factored Tensors , 2007, SIAM J. Sci. Comput..

[55]  Rasmus Bro,et al.  Multiway analysis of epilepsy tensors , 2007, ISMB/ECCB.

[56]  Tamara G. Kolda,et al.  Temporal Analysis of Semantic Graphs Using ASALSAN , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).