Composite Likelihood Data Augmentation for Within-Network Statistical Relational Learning

The prevalence of datasets that can be represented as networks has recently fueled a great deal of work in the area of Relational Machine Learning (RML). Due to the statistical correlations between linked nodes in the network, many RML methods focus on predicting node features (i.e., labels) using the network relationships. However, many domains are comprised of a single, partially-labeled network. Thus, relational versions of Expectation Maximization (i.e., R-EM), which jointly learn parameters and infer the missing labels, can outperform methods that learn parameters from the labeled data and apply them for inference on the unlabeled nodes. Although R-EM methods can significantly improve predictive performance in networks that are densely labeled, they do not achieve the same gains in sparsely labeled networks and can perform worse than RML methods. In this work, we show the fixed-point methods that R-EM uses for approximate learning and inference result in errors that prevent convergence in sparsely labeled networks. We then propose two methods that do not experience this problem. First, we develop a Relational Stochastic EM (R-SEM) method, which uses stochastic parameters that are not as susceptible to approximation errors. Then we develop a Relational Data Augmentation (R-DA) method, which integrates over a range of stochastic parameter values for inference. R-SEM and R-DA can use any collective RML algorithm for learning and inference in partially labeled networks. We analyze their performance with two RML learners over four real world datasets, and show that they outperform independent learning, RML and R-EM -- particularly in sparsely labeled networks.

[1]  Jennifer Neville,et al.  Relational Dependency Networks , 2007, J. Mach. Learn. Res..

[2]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[3]  Gilles Celeux,et al.  On Stochastic Versions of the EM Algorithm , 1995 .

[4]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[5]  Jennifer Neville,et al.  Across-Model Collective Ensemble Classification , 2011, AAAI.

[6]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[7]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[8]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[9]  J. Shavlik,et al.  Learning Relational Probabilistic Models from Partially Observed Data-Opening the Closed-World Assumption , 2013 .

[10]  Nicholas C. Valler,et al.  Got the Flu (or Mumps)? Check the Eigenvalue! , 2010, 1004.0060.

[11]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[12]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[13]  Jennifer Neville,et al.  Pseudolikelihood EM for Within-network Relational Learning , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[14]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[15]  Matthew J. Beal,et al.  The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures , 2003 .

[16]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Kristian Kersting,et al.  Gradient-based boosting for statistical relational learning: The relational dependency network case , 2011, Machine Learning.

[19]  David W. Aha,et al.  Labels or attributes?: rethinking the neighbors for collective classification in sparsely-labeled networks , 2013, CIKM.

[20]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[21]  Kalyan Moy Gupta,et al.  Cautious Collective Classification , 2009, J. Mach. Learn. Res..

[22]  Jennifer Neville,et al.  Understanding Propagation Error and Its Effect on Collective Classification , 2011, 2011 IEEE 11th International Conference on Data Mining.

[23]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..