Automatic Unsupervised Tensor Mining with Quality Assessment

A popular tool for unsupervised modelling and mining multi-aspect data is tensor decomposition. In an exploratory setting, where and no labels or ground truth are available how can we automatically decide how many components to extract? How can we assess the quality of our results, so that a domain expert can factor this quality measure in the interpretation of our results? In this paper, we introduce AutoTen, a novel automatic unsupervised tensor mining algorithm with minimal user intervention, which leverages and improves upon heuristics that assess the result quality. We extensively evaluate AutoTen's performance on synthetic data, outperforming existing baselines on this very hard problem. Finally, we apply AutoTen on a variety of real datasets, providing insights and discoveries. We view this work as a step towards a fully automated, unsupervised tensor mining tool that can be easily adopted by practitioners in academia and industry.

[1]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[2]  Jimeng Sun,et al.  MetaFac: community discovery via relational hypergraph factorization , 2009, KDD.

[3]  Nikos D. Sidiropoulos,et al.  Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x , 2014, SDM.

[4]  J. Leeuw,et al.  Principal component analysis of three-mode data by means of alternating least squares algorithms , 1980 .

[5]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[6]  Nikos D. Sidiropoulos,et al.  ParCube: Sparse Parallelizable Tensor Decompositions , 2012, ECML/PKDD.

[7]  Ananthram Swami,et al.  Com2: Fast Automatic Discovery of Temporal ('Comet') Communities , 2014, PAKDD.

[8]  Giorgio Ottaviani,et al.  On Generic Identifiability of 3-Tensors of Small Rank , 2011, SIAM J. Matrix Anal. Appl..

[9]  Xing Xie,et al.  Collaborative Filtering Meets Mobile Recommendation: A User-Centered Approach , 2010, AAAI.

[10]  Ian Davidson,et al.  Network discovery via constrained tensor analysis of fMRI data , 2013, KDD.

[11]  Tamara G. Kolda,et al.  Higher-order Web link analysis using multilinear algebra , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12]  Pauli Miettinen,et al.  Walk 'n' Merge: A Scalable Algorithm for Boolean Tensor Factorization , 2013, 2013 IEEE 13th International Conference on Data Mining.

[13]  Liqing Zhang,et al.  Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[15]  WonkaPeter,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2013 .

[16]  Christos Faloutsos,et al.  MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[17]  Philip S. Yu,et al.  DuSK: A Dual Structure-preserving Kernel for Supervised Tensor Learning with Applications to Neuroimages , 2014, SDM.

[18]  D. A. Wolf Recent advances in descriptive multivariate analysis , 1996 .

[19]  Christos Faloutsos,et al.  FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop , 2014, SDM.

[20]  Lars Schmidt-Thieme,et al.  Pairwise interaction tensor factorization for personalized tag recommendation , 2010, WSDM '10.

[21]  Dino Ienco,et al.  Do more views of a graph help? Community detection and clustering in multi-graphs , 2013, Proceedings of the 16th International Conference on Information Fusion.

[22]  Wayne R. Dyksen,et al.  Efficient vector and parallel manipulation of tensor products , 1996, TOMS.

[23]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[24]  L. K. Hansen,et al.  Automatic relevance determination for multi‐way models , 2009 .

[25]  Rasmus Bro,et al.  MULTI-WAY ANALYSIS IN THE FOOD INDUSTRY Models, Algorithms & Applications , 1998 .

[26]  Chris H. Q. Ding,et al.  Simultaneous tensor subspace selection and clustering: the equivalence of high order svd and k-means clustering , 2008, KDD.

[27]  Jiawei Han,et al.  Tensor space model for document analysis , 2006, SIGIR.

[28]  Christos Faloutsos,et al.  Fast efficient and scalable Core Consistency Diagnostic for the parafac decomposition for big sparse tensors , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Fei Wang,et al.  FEMA: flexible evolutionary multi-faceted analysis for dynamic behavioral pattern discovery , 2014, KDD.

[30]  Guangzhong Sun,et al.  Driving with knowledge from the physical world , 2011, KDD.

[31]  Yu Zheng,et al.  Travel time estimation of a path using sparse trajectories , 2014, KDD.

[32]  Tamara G. Kolda,et al.  Efficient MATLAB Computations with Sparse and Factored Tensors , 2007, SIAM J. Sci. Comput..

[33]  Henk A L Kiers,et al.  A fast method for choosing the numbers of components in Tucker3 analysis. , 2003, The British journal of mathematical and statistical psychology.

[34]  Jimeng Sun,et al.  Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.

[35]  Tamara G. Kolda,et al.  On Tensors, Sparsity, and Nonnegative Factorizations , 2011, SIAM J. Matrix Anal. Appl..

[36]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[37]  Dacheng Tao,et al.  Empirical Discriminative Tensor Analysis for Crime Forecasting , 2011, KSEM.

[38]  André Lima Férrer de Almeida,et al.  Distributed large-scale tensor decomposition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Yiqun Liu,et al.  Understanding the Sparsity: Augmented Matrix Factorization with Sampled Constraints on Unobservables , 2014, CIKM.

[40]  Tamara G. Kolda,et al.  Temporal Analysis of Semantic Graphs Using ASALSAN , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[41]  Brett W. Bader,et al.  The TOPHITS Model for Higher-Order Web Link Analysis∗ , 2006 .

[42]  Kai-Wei Chang,et al.  Typed Tensor Decomposition of Knowledge Bases for Relation Extraction , 2014, EMNLP.

[43]  Pauli Miettinen,et al.  Clustering Boolean tensors , 2015, Data Mining and Knowledge Discovery.

[44]  Tamara G. Kolda,et al.  Temporal Link Prediction Using Matrix and Tensor Factorizations , 2010, TKDD.

[45]  Jieping Ye,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Xuelong Li,et al.  Supervised tensor learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[47]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[48]  R. Bro,et al.  A new efficient method for determining the number of components in PARAFAC models , 2003 .

[49]  David Lazer,et al.  Inferring friendship network structure by using mobile phone data , 2009, Proceedings of the National Academy of Sciences.

[50]  Kai-Wei Chang,et al.  Multi-Relational Latent Semantic Analysis , 2013, EMNLP.

[51]  Tamara G. Kolda,et al.  Scalable Tensor Decompositions for Multi-aspect Data Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[52]  Lieven De Lathauwer,et al.  Decompositions of a Higher-Order Tensor in Block Terms - Part III: Alternating Least Squares Algorithms , 2008, SIAM J. Matrix Anal. Appl..

[53]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.