Social bookmarking services like Delicious let users annotate their bookmarks with “tags”, freely chosen keywords to facilitate later retrieval. The success of these systems and the central storage of the generated data has led to the creation of vast networked datasets comprised of (document, user, tag)-triples. These datasets can be thought of as commented traces of human online behaviour, and as such have been subject to a significant amount of research lately, showing, e.g. systematic convergence properties[6]. This interest is further sparked by their peculiar theoretical properties: If the triples are interpreted as edges, the resulting graphs are 3-partite 3-uniform (or 3,3-) hypergraphs, i.e. generalized graphs in which each edge connects three nodes from three distinct partitions. Many established methods from complex network analysis need to be generalized in order to be properly applicable to these structures[1, 5, 10]. Here, we discuss the theoretical intricacies and practical benefits of generalizing community detection. Community detection aims to identify groups of closely connected nodes in graphs. Identifying related nodes may be useful in itself, but furthermore helps in examining complex networks’ macrostructure by collapsing those nodes into clusters and examining the network of clusters instead. Given the complexity, noise and sheer size of social bookmarking datasets, such reductions appear crucial for distilling the latent information contained. Out of a vast space of possible approaches [4], we will examine the particularly wellestablished approach of modularity optimization[12] and its possible extensions for hypergraphs. We will proceed to discuss the challenges for generalizing modularity. In addition to discussing earlier work, we will then introduce a new approach which works natively on hypergraphs. After evaluating the different approaches, we turn to the practical side and introduce an interactive visualization tool that not only illustrates the differences between different algorithms, but more generally demonstrates the richness of the structures found in social bookmarking data.
[1]
M. Newman,et al.
Finding community structure in networks using the eigenvectors of matrices.
,
2006,
Physical review. E, Statistical, nonlinear, and soft matter physics.
[2]
C. Bauckhage,et al.
Analyzing Social Bookmarking Systems : A del . icio . us Cookbook
,
2008
.
[3]
Jimeng Sun,et al.
MetaFac: community discovery via relational hypergraph factorization
,
2009,
KDD.
[4]
Bernardo A. Huberman,et al.
Usage patterns of collaborative tagging systems
,
2006,
J. Inf. Sci..
[5]
M. Newman,et al.
Finding community structure in very large networks.
,
2004,
Physical review. E, Statistical, nonlinear, and soft matter physics.
[6]
Hyperincident connected components of tagging networks
,
2009,
HT '09.
[7]
K. Obermayer,et al.
Towards Community Detection in k-Partite k-Uniform Hypergraphs
,
2009
.
[8]
Vittorio Loreto,et al.
Network properties of folksonomies
,
2007,
AI Commun..
[9]
Santo Fortunato,et al.
Community detection in graphs
,
2009,
ArXiv.
[10]
Tsuyoshi Murata.
Detecting communities from tripartite networks
,
2010,
WWW '10.
[11]
Guido Caldarelli,et al.
Random hypergraphs and their applications
,
2009,
Physical review. E, Statistical, nonlinear, and soft matter physics.
[12]
Leon Danon,et al.
Comparing community structure identification
,
2005,
cond-mat/0505245.
[13]
Tsuyoshi Murata,et al.
Modularities for bipartite networks
,
2009,
HT '09.
[14]
Charu C. Aggarwal,et al.
Graph Clustering
,
2010,
Encyclopedia of Machine Learning and Data Mining.