Estimating latent feature-feature interactions in large feature-rich graphs

Real-world complex networks describe connections between objects; in reality, those objects are often endowed with some kind of features. How does the presence or absence of such features interplay with the network link structure? Although the situation here described is truly ubiquitous, there is a limited body of research dealing with large graphs of this kind. Many previous works considered homophily as the only possible transmission mechanism translating node features into links. Other authors, instead, developed more sophisticated models, that are able to handle complex feature interactions, but are unfit to scale to very large networks. We expand on the MGJ model, where interactions between pairs of features can foster or discourage link formation. In this work, we will investigate how to estimate the latent feature-feature interactions in this model. We shall propose two solutions: the first one assumes feature independence and it is essentially based on Naive Bayes; the second one, which relaxes the independence assumption assumption, is based on perceptrons. In fact, we show it is possible to cast the model equation in order to see it as the prediction rule of a perceptron. We analyze how classical results for the perceptrons can be interpreted in this context; then, we define a fast and simple perceptron-like algorithm for this task, which can process $10^8$ links in minutes. We then compare these two techniques, first with synthetic datasets that follows our model, gaining evidence that the Naive independence assumptions are detrimental in practice. Secondly, we consider a real, large-scale citation network where each node (i.e., paper) can be described by different types of characteristics; there, our algorithm can assess how well each set of features can explain the links, and thus finding meaningful latent feature-feature interactions.

[1]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[2]  M. A. Muñoz,et al.  Scale-free networks from varying vertex intrinsic fitness. , 2002, Physical review letters.

[3]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[4]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[5]  Jennifer Neville,et al.  Attributed graph models: modeling network structure with correlated attributes , 2014, WWW.

[6]  Koch Sigmund Ed,et al.  Psychology: A Study of A Science , 1962 .

[7]  Hans-Peter Kriegel,et al.  Learning Infinite Hidden Relational Models , 2006 .

[8]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[9]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[10]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[11]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[12]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[13]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[14]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[15]  Marc Najork,et al.  Computing Information Retrieval Performance Measures Efficiently in the Presence of Tied Scores , 2008, ECIR.

[16]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[17]  Chris H Wiggins,et al.  Bayesian approach to network modularity. , 2007, Physical review letters.

[18]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[19]  Lei Chen,et al.  On Uncertain Graphs Modeling and Queries , 2015, Proc. VLDB Endow..

[20]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[21]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[22]  William W. Cohen,et al.  Single-pass online learning: performance, voting schemes and online feature selection , 2006, KDD '06.

[23]  Silvio Lattanzi,et al.  Affiliation networks , 2009, STOC '09.

[24]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[25]  Nitesh V. Chawla,et al.  Evaluating link prediction methods , 2014, Knowledge and Information Systems.

[26]  D. Sculley,et al.  Online Active Learning Methods for Fast Label-Efficient Spam Filtering , 2007, CEAS.

[27]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[28]  Nitin Agarwal,et al.  A study of homophily on social media , 2012, World Wide Web.

[29]  Zoubin Ghahramani,et al.  Modeling Dyadic Data with Binary Latent Factors , 2006, NIPS.

[30]  Claudio Gentile,et al.  A New Approximate Maximal Margin Classification Algorithm , 2002, J. Mach. Learn. Res..

[31]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[32]  Paolo Boldi,et al.  A network model characterized by a latent attribute structure with competition , 2014, Inf. Sci..

[33]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[34]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[35]  Tina Eliassi-Rad,et al.  Applying latent dirichlet allocation to group discovery in large graphs , 2009, SAC '09.

[36]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[37]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[38]  S. Stouffer,et al.  Measurement and Prediction , 1954 .

[39]  R. Mikolajczyk,et al.  Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases , 2008, PLoS medicine.

[40]  Jure Leskovec,et al.  Modeling Social Networks with Node Attributes using the Multiplicative Attribute Graph Model , 2011, UAI.

[41]  A. Barabasi,et al.  Uncovering disease-disease relationships through the incomplete interactome , 2015, Science.

[42]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[43]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[44]  B. Schölkopf,et al.  Modeling Dyadic Data with Binary Latent Factors , 2007 .

[45]  R. Breiger The Duality of Persons and Groups , 1974 .

[46]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[47]  Peter D. Hoff,et al.  Multiplicative latent factor models for description and prediction of social networks , 2009, Comput. Math. Organ. Theory.

[48]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[49]  Thomas L. Griffiths,et al.  Nonparametric Latent Feature Models for Link Prediction , 2009, NIPS.

[50]  George Kollios,et al.  k-nearest neighbors in uncertain graphs , 2010, Proc. VLDB Endow..

[51]  Ling Huang,et al.  Evolution of social-attribute networks: measurements, modeling, and implications using google+ , 2012, Internet Measurement Conference.

[52]  K. Holmes,et al.  Sexual mixing patterns in the spread of gonococcal and chlamydial infections. , 1999, American journal of public health.

[53]  Ciro Cattuto,et al.  Close Encounters in a Pediatric Ward: Measuring Face-to-Face Proximity and Mixing Patterns with Wearable Sensors , 2011, PloS one.

[54]  Jure Leskovec,et al.  Nonparametric Multi-group Membership Model for Dynamic Networks , 2013, NIPS.

[55]  Marie-Laure Mugnier,et al.  Graph-based Knowledge Representation - Computational Foundations of Conceptual Graphs , 2008, Advanced Information and Knowledge Processing.

[56]  Alessandro Rozza,et al.  Modelling political disaffection from Twitter data , 2013, WISDOM '13.

[57]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[58]  N. Hens,et al.  Mining social mixing patterns for infectious disease models based on a two-day population survey in Belgium , 2009, BMC infectious diseases.

[59]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[60]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[61]  Jun Yu,et al.  Learning Algorithms for Link Prediction Based on Chance Constraints , 2010, ECML/PKDD.

[62]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[63]  Jure Leskovec,et al.  Multiplicative Attribute Graph Model of Real-World Networks , 2010, Internet Math..