Bayesian Nonparametric Models for Multiway Data Analysis

Tensor decomposition is a powerful computational tool for multiway data analysis. Many popular tensor decomposition approaches—such as the Tucker decomposition and CANDECOMP/PARAFAC (CP)—amount to multi-linear factorization. They are insufficient to model (i) complex interactions between data entities, (ii) various data types (e.g., missing data and binary data), and (iii) noisy observations and outliers. To address these issues, we propose tensor-variate latent nonparametric Bayesian models for multiway data analysis. We name these models InfTucker. These new models essentially conduct Tucker decomposition in an infinite feature space. Unlike classical tensor decomposition models, our new approaches handle both continuous and binary data in a probabilistic framework. Unlike previous Bayesian models on matrices and tensors, our models are based on latent Gaussian or $t$ processes with nonlinear covariance functions. Moreover, on network data, our models reduce to nonparametric stochastic blockmodels and can be used to discover latent groups and predict missing interactions. To learn the models efficiently from data, we develop a variational inference technique and explore properties of the Kronecker product for computational efficiency. Compared with a classical variational implementation, this technique reduces both time and space complexities by several orders of magnitude. On real multiway and network data, our new models achieved significantly higher prediction accuracy than state-of-art tensor decomposition methods and blockmodels.

[1]  Lawrence Carin,et al.  Hidden Markov Models With Stick-Breaking Priors , 2009, IEEE Transactions on Signal Processing.

[2]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[3]  WillingerWalter,et al.  Spatio-temporal compressive sensing and internet traffic matrices , 2009 .

[4]  Peter D. Hoff,et al.  Modeling homophily and stochastic equivalence in symmetric relational data , 2007, NIPS.

[5]  Wei Chu,et al.  Probabilistic Models for Incomplete Multi-dimensional Arrays , 2009, AISTATS.

[6]  Zenglin Xu,et al.  Sparse matrix-variate Gaussian process blockmodels for network modeling , 2011, UAI.

[7]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[8]  Zoubin Ghahramani,et al.  Random function priors for exchangeable arrays with applications to graphs and relational data , 2012, NIPS.

[9]  Peter D. Hoff,et al.  Hierarchical multilinear models for multiway data , 2010, Comput. Stat. Data Anal..

[10]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[11]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[12]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[13]  Wei Chu,et al.  Stochastic Relational Models for Discriminative Link Prediction , 2006, NIPS.

[14]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[15]  Yuan Qi,et al.  Nonparametric Bayesian Matrix Factorization by Power-EP , 2010, AISTATS.

[16]  S. Lauritzen Exchangeable Rasch Matrices∗ , 2007 .

[17]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[18]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[19]  Alfred O. Hero,et al.  Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies , 2010, BMC Bioinformatics.

[20]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[21]  Thomas L. Griffiths,et al.  Discovering Latent Classes in Relational Data , 2004 .

[22]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[23]  Walter Willinger,et al.  Spatio-temporal compressive sensing and internet traffic matrices , 2009, SIGCOMM '09.

[24]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[25]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[26]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[27]  Tamara G. Kolda,et al.  Scalable Tensor Factorizations for Incomplete Data , 2010, ArXiv.

[28]  Neil D. Lawrence,et al.  Non-linear matrix factorization with Gaussian processes , 2009, ICML '09.

[29]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[30]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[31]  D. Aldous Representations for partially exchangeable arrays of random variables , 1981 .

[32]  A. Emili,et al.  Interaction network containing conserved and essential protein complexes in Escherichia coli , 2005, Nature.

[33]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[34]  Thomas L. Griffiths,et al.  Nonparametric Latent Feature Models for Link Prediction , 2009, NIPS.

[35]  Neil D. Lawrence,et al.  Semi-supervised Learning via Gaussian Processes , 2004, NIPS.

[36]  David B. Dunson,et al.  Bayesian Conditional Tensor Factorizations for High-Dimensional Classification , 2013, Journal of the American Statistical Association.

[37]  Gal Chechik,et al.  Euclidean Embedding of Co-occurrence Data , 2004, J. Mach. Learn. Res..

[38]  Max Welling,et al.  Multi-HDP: A Non Parametric Bayesian Model for Tensor Factorization , 2008, AAAI.

[39]  Zoubin Ghahramani,et al.  An Infinite Latent Attribute Model for Network Data , 2012, ICML.