Maximum Likelihood Estimation of a Low-Rank Probability Mass Tensor From Partial Observations

We consider the problem of estimating the Probability Mass Function (PMF) of a discrete random vector (RV) from partial observations, namely when some elements in each observed realization may be missing. Since the PMF takes the form of a multi-way tensor, under certain model assumptions the problem becomes closely associated with tensor factorization. Indeed, in recent studies it was shown that a low-rank PMF tensor can be fully recovered (under some mild conditions) by applying a low-rank (approximate) joint factorization to all estimated joint PMFs of subsets of fixed cardinality larger than two (e.g., triplets). The joint factorization is based on a Least Squares (LS) fit to the estimated lower-order sub-tensors. In this letter we take a different estimation approach by fitting the partial factorization directly to the observed partial data in the sense of Kullback-Leibler divergence (KLD). Consequently, we avoid the need for particular selection and direct estimation of sub-tensors of a particular order, as we inherently apply proper weighting to all the available partial data. We show that our approach essentially attains the Maximum Likelihood estimate of the full PMF tensor (under the low-rank model) and therefore enjoys its well-known properties of consistency and asymptotic efficiency. In addition, based on the Bayesian model interpretation of the low-rank model, we propose an Estimation-Maximization (EM) based approach, which is computationally cheap per iteration. Simulation results demonstrate the advantages of our proposed KLD-based hybrid approach (combining alternating-directions minimization with EM) over LS fitting of sub-tensors.

[1]  Xiao Fu,et al.  Tensors, Learning, and “Kolmogorov Extension” for Finite-Alphabet Random Vectors , 2017, IEEE Transactions on Signal Processing.

[2]  Zbynek Koldovský,et al.  Weight Adjusted Tensor Method for Blind Separation of Underdetermined Mixtures of Nonstationary Sources , 2011, IEEE Transactions on Signal Processing.

[3]  Tamara G. Kolda,et al.  On Tensors, Sparsity, and Nonnegative Factorizations , 2011, SIAM J. Matrix Anal. Appl..

[4]  Tamara G. Kolda,et al.  Newton-based optimization for Kullback–Leibler nonnegative tensor factorizations , 2013, Optim. Methods Softw..

[5]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorizations : An algorithmic perspective , 2014, IEEE Signal Processing Magazine.

[6]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Nikos D. Sidiropoulos,et al.  Learning Mixtures of Smooth Product Distributions: Identifiability and Algorithm , 2019, AISTATS.

[9]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[10]  Arie Yeredor,et al.  High-Order Analysis of the Efficiency Gap for Maximum Likelihood Estimation in Nonlinear Gaussian Models , 2018, IEEE Transactions on Signal Processing.

[11]  Arie Yeredor,et al.  Estimation of a Low-Rank Probability-Tensor from Sample Sub-Tensors via Joint Factorization Minimizing the Kullback-Leibler Divergence , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[12]  Andrzej Cichocki,et al.  Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[13]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .