Probabilistic Models for Incomplete Multi-dimensional Arrays

In multiway data, each sample is measured by multiple sets of correlated attributes. We develop a probabilistic framework for modeling structural dependency from partially observed multi-dimensional array data, known as pTucker. Latent components associated with individual array dimensions are jointly retrieved while the core tensor is integrated out. The resulting algorithm is capable of handling large-scale data sets. We verify the usefulness of this approach by comparing against classical models on applications to modeling amino acid fluorescence, collaborative filtering and a number of benchmark multiway array data.

[1]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[2]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[3]  G. Golub,et al.  A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies , 2007, Proceedings of the National Academy of Sciences.

[4]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[5]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[6]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[7]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[8]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[9]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[10]  Rasmus Bro,et al.  MULTI-WAY ANALYSIS IN THE FOOD INDUSTRY Models, Algorithms & Applications , 1998 .

[11]  Gene H. Golub,et al.  Rank-One Approximation to High Order Tensors , 2001, SIAM J. Matrix Anal. Appl..

[12]  R. Harshman The differences between analysis of covariance and correlation , 2001 .

[13]  A. Agresti,et al.  Multiway Data Analysis , 1989 .

[14]  Sabine Süsstrunk,et al.  Higher Order SVD Analysis for Dynamic Texture Synthesis , 2008, IEEE Transactions on Image Processing.

[15]  Max Welling,et al.  Multi-HDP: A Non Parametric Bayesian Model for Tensor Factorization , 2008, AAAI.

[16]  Roman Rosipal,et al.  An Expectation-Maximization Approach to Nonlinear Component Analysis , 2001, Neural Computation.

[17]  Rokia Missaoui,et al.  A probabilistic model for data cube compression and query approximation , 2007, DOLAP '07.

[18]  Seungjin Choi,et al.  Nonnegative Tucker Decomposition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Q. Shi,et al.  Gaussian Process Latent Variable Models for , 2011 .

[20]  Rasmus Bro,et al.  The N-way Toolbox for MATLAB , 2000 .

[21]  Bülent Yener,et al.  Unsupervised Multiway Data Analysis: A Literature Survey , 2009, IEEE Transactions on Knowledge and Data Engineering.

[22]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[23]  Wei Chu,et al.  Gaussian Process Models for Link Analysis and Transfer Learning , 2007, NIPS.

[24]  Michael E. Tipping Sparse Kernel Principal Component Analysis , 2000, NIPS.

[25]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[26]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[27]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[28]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[29]  B. Schölkopf,et al.  Modeling Dyadic Data with Binary Latent Factors , 2007 .

[30]  Huan Liu,et al.  CubeSVD: a novel approach to personalized Web search , 2005, WWW '05.