NORMO: A new method for estimating the number of components in CP tensor decomposition

Abstract Tensor decompositions are multi-way analysis tools which have been successfully applied in a wide range of different fields. However, there are still challenges that remain few explored, namely the following: when applying tensor decomposition techniques, what should we expect from the result? How can we evaluate its quality? It is expected that, when the number of components is suitable, then few redundancy is observed in the decomposition result. Based on this assumption, we propose a new method, NORMO, which aims at estimating the number of components in CANDECOMP/PARAFAC (CP) decomposition so that no redundancy is observed in the result. To the best of our knowledge, this work encompasses the first attempt to tackle such problem. According to our experiments, the number of non-redundant components estimated by NORMO is among the most accurate estimates of the true CP number of components in both synthetic and real-world tensor datasets (thus validating the rationale guiding our method). Moreover, NORMO is more efficient than most of its competitors. Additionally, our method can be used to discover multi-levels of granularity in the patterns discovered.

[1]  Rasmus Bro,et al.  Multiway analysis of epilepsy tensors , 2007, ISMB/ECCB.

[2]  R. Bro Exploratory study of sugar production using fluorescence spectroscopy and multi-way analysis , 1999 .

[3]  David J. Marchette,et al.  Scan Statistics on Enron Graphs , 2005, Comput. Math. Organ. Theory.

[4]  Christos Faloutsos,et al.  Fast efficient and scalable Core Consistency Diagnostic for the parafac decomposition for big sparse tensors , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  J. H. Choi,et al.  DFacTo: Distributed Factorization of Tensors , 2014, NIPS.

[6]  H. Kiers,et al.  Selecting among three-mode principal component models of different types and complexities: a numerical convex hull based method. , 2006, The British journal of mathematical and statistical psychology.

[7]  Jieping Ye,et al.  Detection of number of components in CANDECOMP/PARAFAC models via minimum description length , 2016, Digit. Signal Process..

[8]  R. Bro,et al.  A new efficient method for determining the number of components in PARAFAC models , 2003 .

[9]  Rasmus Bro,et al.  Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models , 2003 .

[10]  Martha Larson,et al.  TFMAP: optimizing MAP for top-n context-aware recommendation , 2012, SIGIR '12.

[11]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[12]  Hadi Fanaee-T,et al.  Tensor-based anomaly detection: An interdisciplinary survey , 2016, Knowl. Based Syst..

[13]  Jean-François Boulicaut,et al.  Cohesive Co-evolution Patterns in Dynamic Attributed Graphs , 2012, Discovery Science.

[14]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[15]  Ce Zhu,et al.  Tensor rank learning in CP decomposition via convolutional neural network , 2019, Signal Process. Image Commun..

[16]  Christos Faloutsos,et al.  MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[17]  Liqing Zhang,et al.  Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  H. Kiers,et al.  Three-mode principal components analysis: choosing the numbers of components and sensitivity to local optima. , 2000, The British journal of mathematical and statistical psychology.

[19]  Rafael Pardo,et al.  Modelling spatial and temporal variations in the water quality of an artificial water reservoir in the semiarid midwest of Argentina. , 2011, Analytica chimica acta.

[20]  Alex Pentland,et al.  Reality mining: sensing complex social systems , 2006, Personal and Ubiquitous Computing.

[21]  Liqing Zhang,et al.  Bayesian Robust Tensor Factorization for Incomplete Multiway Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Richard A. Harshman,et al.  Factor analysis of tongue shapes. , 1971, The Journal of the Acoustical Society of America.

[23]  Nikos D. Sidiropoulos,et al.  Tensors for Data Mining and Data Fusion , 2016, ACM Trans. Intell. Syst. Technol..

[24]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[25]  Alex Pentland,et al.  Social fMRI: Investigating and shaping social mechanisms in the real world , 2011, Pervasive Mob. Comput..

[26]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[27]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[28]  L. K. Hansen,et al.  Automatic relevance determination for multi‐way models , 2009 .

[29]  Rasmus Bro,et al.  The N-way Toolbox for MATLAB , 2000 .

[30]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[31]  Rasmus Bro,et al.  Analysis of lipoproteins using 2D diffusion-edited NMR spectroscopy and multi-way chemometrics , 2005 .

[32]  Christos Faloutsos,et al.  HaTen2: Billion-scale tensor decompositions , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[33]  Ciro Cattuto,et al.  Detecting the Community Structure and Activity Patterns of Temporal Networks: A Non-Negative Tensor Factorization Approach , 2013, PloS one.

[34]  Otto Vollnhals,et al.  Dictionary of computer science , 1984 .

[35]  Evangelos E. Papalexakis,et al.  Automatic Unsupervised Tensor Mining with Quality Assessment , 2015, SDM.

[36]  Henk A. L. Kiers,et al.  A three–step algorithm for CANDECOMP/PARAFAC analysis of large data sets with multicollinearity , 1998 .

[37]  Pauli Miettinen,et al.  Walk 'n' Merge: A Scalable Algorithm for Boolean Tensor Factorization , 2013, 2013 IEEE 13th International Conference on Data Mining.

[38]  H. Kiers Towards a standardized notation and terminology in multiway analysis , 2000 .