Bayesian Factorizations of Big Sparse Tensors

It has become routine to collect data that are structured as multiway arrays (tensors). There is an enormous literature on low rank and sparse matrix factorizations, but limited consideration of extensions to the tensor case in statistics. The most common low rank tensor factorization relies on parallel factor analysis (PARAFAC), which expresses a rank k tensor as a sum of rank one tensors. In contingency table applications in which the sample size is massively less than the number of cells in the table, the low rank assumption is not sufficient and PARAFAC has poor performance. We induce an additional layer of dimension reduction by allowing the effective rank to vary across dimensions of the table. Taking a Bayesian approach, we place priors on terms in the factorization and develop an efficient Gibbs sampler for posterior computation. Theory is provided showing posterior concentration rates in high-dimensional settings, and the methods are shown to have excellent performance in simulations and several real data applications.

[1]  Runze Li Variable selection for high-dimensional data , 2005 .

[2]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[3]  Wenxin Jiang Bayesian variable selection for high dimensional generalized linear models : Convergence rates of the fitted densities , 2007, 0710.3458.

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  Rick Chartrand,et al.  Nonconvex Splitting for Regularized Low-Rank + Sparse Decomposition , 2012, IEEE Transactions on Signal Processing.

[6]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[7]  Michel José Anzanello,et al.  Chemometrics and Intelligent Laboratory Systems , 2009 .

[8]  D. Dunson,et al.  Sparse Bayesian infinite factor models. , 2011, Biometrika.

[9]  E. Belitser,et al.  Adaptive Bayesian inference on the mean of an infinite-dimensional normal distribution , 2003 .

[10]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[11]  Jieping Ye,et al.  Sparse non-negative tensor factorization using columnwise coordinate descent , 2012, Pattern Recognit..

[12]  Debdeep Pati,et al.  Posterior contraction in sparse Bayesian factor models for massive covariance matrices , 2012, 1206.3627.

[13]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[14]  Subhashis Ghosal,et al.  Asymptotic normality of posterior distributions in high-dimensional linear models , 1999 .

[15]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[16]  James G. Scott,et al.  Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction , 2022 .

[17]  A. U.S.,et al.  Posterior consistency in linear models under shrinkage priors , 2013 .

[18]  David B. Dunson,et al.  Bayesian Conditional Tensor Factorizations for High-Dimensional Classification , 2013, Journal of the American Statistical Association.

[19]  Pierre Comon,et al.  Nonnegative approximations of nonnegative tensors , 2009, ArXiv.

[20]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[21]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[22]  Dominique Bontemps,et al.  Bernstein von Mises Theorems for Gaussian Regression with increasing number of regressors , 2010, 1009.1370.

[23]  H. Massam,et al.  A conjugate prior for discrete hierarchical log-linear models , 2006, 0711.1609.

[24]  D. Dunson,et al.  Nonparametric Bayes Modeling of Multivariate Categorical Data , 2009, Journal of the American Statistical Association.

[25]  A. V. D. Vaart,et al.  Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences , 2012, 1211.1197.

[26]  Michael P. Friedlander,et al.  Computing non-negative tensor factorizations , 2008, Optim. Methods Softw..

[27]  David B. Dunson,et al.  Posterior consistency in conditional distribution estimation , 2013, J. Multivar. Anal..

[28]  P. Müller,et al.  Bayesian inference for gene expression and proteomics , 2006 .

[29]  Stephen E. Fienberg,et al.  Three centuries of categorical data analysis: Log-linear models and maximum likelihood estimation , 2007 .

[30]  Jaeyong Lee,et al.  GENERALIZED DOUBLE PARETO SHRINKAGE. , 2011, Statistica Sinica.

[31]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[32]  Karim Lounici,et al.  Variable Selection with Exponential Weights and $l_0$-Penalization , 2012, 1208.2635.

[33]  S. Ghosal Asymptotic Normality of Posterior Distributions for Exponential Families when the Number of Parameters Tends to Infinity , 2000 .

[34]  M. Talagrand A new look at independence , 1996 .

[35]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[36]  D. Dunson,et al.  Simplex Factor Models for Multivariate Unordered Categorical Data , 2012, Journal of the American Statistical Association.

[37]  Chris Hans Elastic Net Regression Modeling With the Orthant Normal Prior , 2011 .

[38]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[39]  Wenxin Jiang,et al.  On Consistency of Bayesian Inference with Mixtures of Logistic Regression , 2006, Neural Computation.

[40]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[41]  Van Der Vaart,et al.  Rates of contraction of posterior distributions based on Gaussian process priors , 2008 .

[42]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[43]  Karim Lounici,et al.  Estimation and variable selection with exponential weights , 2014 .

[44]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[45]  Nuria Oliver,et al.  Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering , 2010, RecSys '10.

[46]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[47]  Miles E. Lopes Estimating Unknown Sparsity in Compressed Sensing , 2013 .

[48]  S. Ghosal,et al.  Adaptive Bayesian multivariate density estimation with Dirichlet mixtures , 2011, 1109.6406.

[49]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[50]  Carlos M. Carvalho,et al.  Sparse Statistical Modelling in Gene Expression Genomics , 2006 .

[51]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.