Inferring Vertex Properties from Topology in Large Networks

Network topology not only tells about tightly-connected “communities,” but also gives cues on more subtle properties of the vertices. We introduce a simple probabilistic latent-variable model which finds either latent blocks or more graded structures, depending on hyperparameters. With collapsed Gibbs sampling it can be estimated for networks of 10 vertices or more, and the number of latent components adapts to data through a Dirichlet process prior. Applied to the social network of a music recommendation site (Last.fm), reasonable combinations of musical genres appear from the network topology, as revealed by subsequent matching of the latent structure with listening habits of the participants. The advantages of the generative nature of the model are explicit handling of uncertainty in the sparse data, and easy interpretability, extensibility, and adaptation to applications with incomplete data.