A Marginalized Denoising Method for Link Prediction in Relational Data

All five forms of expectation can be explicitly represented in analytical forms. The basic idea is that the multiplication of matrices can be explicitly written in a summation of individual terms, e.g. (AXX )ij = ∑ kl AikXklXjl. When Xij is blank-out with probability p and we let q = 1 − p as the survival probability, the expectation of a single term AikXklXjl would be qAikXklXjl if k 6= j, otherwise it will be qAijXjlXjl when k = j. In other words, when two or more random variables X (t) ij share the same subscripts, they are no longer uncorrelated, and thus we need to account for these cases in the summation terms by adjusting the difference between the normal cases and the special cases. In the case of quadratic terms of X , we only have one special case to consider. In cubic terms we have 4 cases, where three of them result in two uncorrelated variables and one results in only one uncorrelated variables in each term. In quartic terms, we have 14 cases. In the following, we derive all analytical forms of the expectation terms as in Table 1. We denote the normal case as D0, where the subscripts of every X are not the same. And we use D12 to indicate the special case that the subscripts of the first and the second X are equal, where there is only one uncorrelated random variable. For quadratic terms, we have the following, E[f(X̃)] = qD0 − q(q − 1)D12, where f(X̃) indicates that the expectation term is in a quadratic form. We have two expectation forms for the quadratic X̃ terms. Then we calculate the D0 and D12 for each form as follows, (1) For E[X̃AX̃], we have

[1]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[2]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[3]  Zoubin Ghahramani,et al.  Random function priors for exchangeable arrays with applications to graphs and relational data , 2012, NIPS.

[4]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[5]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[6]  Srinivasan Parthasarathy,et al.  Local Probabilistic Models for Link Prediction , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[7]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[8]  Peter D. Hoff,et al.  Modeling homophily and stochastic equivalence in symmetric relational data , 2007, NIPS.

[9]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[10]  Nitesh V. Chawla,et al.  New perspectives and methods in link prediction , 2010, KDD.

[11]  William Stafford Noble,et al.  Learning kernels from biological networks by maximizing entropy , 2004, ISMB/ECCB.

[12]  Liang Ge,et al.  Pseudo Cold Start Link Prediction with Multiple Sources in Social Networks , 2012, SDM.

[13]  Feiping Nie,et al.  Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization , 2012, RECOMB.

[14]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[15]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[16]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[17]  Kilian Q. Weinberger,et al.  Fast Image Tagging , 2013, ICML.

[18]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[19]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[20]  Charles Elkan,et al.  A Log-Linear Model with Latent Features for Dyadic Prediction , 2010, 2010 IEEE International Conference on Data Mining.

[21]  Yihong Gong,et al.  Fast nonparametric matrix factorization for large-scale collaborative filtering , 2009, SIGIR.

[22]  Thomas L. Griffiths,et al.  Nonparametric Latent Feature Models for Link Prediction , 2009, NIPS.

[23]  A. Emili,et al.  Interaction network containing conserved and essential protein complexes in Escherichia coli , 2005, Nature.

[24]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[25]  Gal Chechik,et al.  Euclidean Embedding of Co-occurrence Data , 2004, J. Mach. Learn. Res..

[26]  Jun Zhu,et al.  Max-Margin Nonparametric Latent Feature Models for Link Prediction , 2012, ICML.