A general method for regularizing tensor decomposition methods via pseudo-data

Tensor decomposition methods allow us to learn the parameters of latent variable models through decomposition of low-order moments of data. A significant limitation of these algorithms is that there exists no general method to regularize them, and in the past regularization has mostly been performed using bespoke modifications to the algorithms, tailored for the particular form of the desired regularizer. We present a general method of regularizing tensor decomposition methods which can be used for any likelihood model that is learnable using tensor decomposition methods and any differentiable regularization function by supplementing the training data with pseudo-data. The pseudo-data is optimized to balance two terms: being as close as possible to the true data and enforcing the desired regularization. On synthetic, semi-synthetic and real data, we demonstrate that our method can improve inference accuracy and regularize for a broad range of goals including transfer learning, sparsity, interpretability, and orthogonality of the learned parameters.

[1]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[2]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[3]  Jordan L. Boyd-Graber,et al.  Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms , 2014, ACL.

[4]  Sanjeev Arora,et al.  A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[5]  Vasilis Syrgkanis,et al.  Adversarial Generalized Method of Moments , 2018, ArXiv.

[6]  Han Liu,et al.  Provable sparse tensor decomposition , 2015, 1502.01425.

[7]  Finale Doshi-Velez,et al.  Comorbidity Clusters in Autism Spectrum Disorders: An Electronic Health Record Time-Series Analysis , 2014, Pediatrics.

[8]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[9]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[10]  William R. Hersh,et al.  Reducing workload in systematic review preparation using automated citation classification. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[11]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Ryan P. Adams,et al.  Priors for Diversity in Generative Latent Variable Models , 2012, NIPS.

[14]  Karl Stratos,et al.  Experiments with Spectral Learning of Latent-Variable PCFGs , 2013, HLT-NAACL.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Ryan P. Adams,et al.  Contrastive Learning Using Spectral Methods , 2013, NIPS.

[17]  Ryan P. Adams,et al.  Graph-Sparse LDA: A Topic Model with Structured Sparsity , 2014, AAAI.

[18]  Dustin Tran,et al.  Spectral M-estimation with Applications to Hidden Markov Models , 2016, AISTATS.

[19]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[20]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[21]  Guosheng Yin,et al.  Bayesian generalized method of moments , 2009 .