Regularizing Generative Models Using Knowledge of Feature Dependence

Generative modeling is a fundamental problem in machine learning with many potential applications. Efficient learning of generative models requires available prior knowledge to be exploited as much as possible. In this paper, we propose a method to exploit prior knowledge of relative dependence between features for learning generative models. Such knowledge is available, for example, when side-information on features is present. We incorporate the prior knowledge by forcing marginals of the learned generative model to follow a prescribed relative feature dependence. To this end, we formulate a regularization term using a kernel-based dependence criterion. The proposed method can be incorporated straightforwardly into many optimization-based learning schemes of generative models, including variational autoencoders and generative adversarial networks. We show the effectiveness of the proposed method in experiments with multiple types of datasets and models.

[1]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[2]  Lei Yu,et al.  A Hybrid Collaborative Filtering Model with Deep Structure for Recommender Systems , 2017, AAAI.

[3]  James R. Foulds,et al.  Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models , 2015, ICML.

[4]  Eric Xing,et al.  Deep Generative Models with Learnable Knowledge Constraints , 2018, NeurIPS.

[5]  Ning Chen,et al.  Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[6]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[7]  Marco Gori,et al.  Bridging logic and kernel machines , 2011, Machine Learning.

[8]  Rui Zhang,et al.  Incorporating Knowledge Graph Embeddings into Topic Modeling , 2017, AAAI.

[9]  Michael I. Jordan,et al.  Information Constraints on Auto-Encoding Variational Bayes , 2018, NeurIPS.

[10]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[11]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[12]  Inderjit S. Dhillon,et al.  Robust Principal Component Analysis with Side Information , 2016, ICML.

[13]  E. F. Vogel,et al.  A plant-wide industrial process control problem , 1993 .

[14]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[15]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[16]  Jun Zhu,et al.  Robust RegBayes: Selectively Incorporating First-Order Logic Domain Knowledge into Bayesian Models , 2014, ICML.

[17]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[18]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[19]  John Blitzer,et al.  Regularized Learning with Networks of Features , 2008, NIPS.

[20]  Arthur Gretton,et al.  A low variance consistent test of relative dependency , 2015, ICML.

[21]  Zenglin Xu,et al.  Learning with Feature Network and Label Network Simultaneously , 2017, AAAI.

[22]  Diyi Yang,et al.  Incorporating Word Correlation Knowledge into Topic Modeling , 2015, NAACL.

[23]  Yu Hu,et al.  Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints , 2015, ACL.

[24]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[26]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[27]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[28]  Guoyin Wang,et al.  Joint Embedding of Words and Labels for Text Classification , 2018, ACL.

[29]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[30]  Eric P. Xing,et al.  Grounding Topic Models with Knowledge Bases , 2016, IJCAI.

[31]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[32]  Naftali Tishby,et al.  Learning to Select Features using their Properties , 2008 .

[33]  Junzhou Huang,et al.  Learning with structured sparsity , 2009, ICML '09.

[34]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[35]  Jude W. Shavlik,et al.  Knowledge-Based Kernel Approximation , 2004, J. Mach. Learn. Res..

[36]  Eric P. Xing,et al.  Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[37]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[38]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[39]  Yukihiro Tadokoro,et al.  Structured Denoising Autoencoder for Fault Detection and Analysis , 2014, ACML.

[40]  Naoya Takeishi,et al.  Knowledge-Based Distant Regularization in Learning Probabilistic Models , 2018, ArXiv.

[41]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[42]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[43]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[44]  Kristian Kersting,et al.  Markov Logic Mixtures of Gaussian Processes: Towards Machines Reading Regression Data , 2012, AISTATS.

[45]  Alexandros Kalousis,et al.  Regularising Non-linear Models Using Feature Side-information , 2017, ICML.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[48]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[49]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[50]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[51]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[52]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[53]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[54]  B. Schölkopf,et al.  Kernel‐based tests for joint independence , 2016, 1603.00285.

[55]  Samy Bengio,et al.  Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[56]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[57]  Marco Gori,et al.  Semantic-based regularization for learning and inference , 2017, Artif. Intell..

[58]  Valero Laparra,et al.  Fair Kernel Learning , 2017, ECML/PKDD.

[59]  Xiaojin Zhu,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation Using First-Order Logic , 2022 .

[60]  Baogang Wei,et al.  Incorporating Probabilistic Knowledge into Topic Models , 2015, PAKDD.

[61]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[62]  Naftali Tishby,et al.  Incorporating Prior Knowledge on Features into Learning , 2007, AISTATS.

[63]  Arthur Gretton,et al.  Large-scale kernel methods for independence testing , 2016, Stat. Comput..